Server for data collection and screening and policy data collection system

A data collection and server technology, applied in the field of data screening, can solve time-consuming and labor-intensive problems, and achieve the effect of saving time and cost

Pending Publication Date: 2021-08-10
常州慈养林信息技术有限公司
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0002] With the development of network technology, the traditional way of publishing policies by paper documents, newspapers and periodicals is now changed to that policies will be published on the Interne...
View more

Method used

In the present embodiment, described policy data module is suitable for constructing policy data according to core data, promptly divides each paragraph into corresponding policy type according to the policy type to which paragraph belongs, to build policy d...
View more

Abstract

The invention belongs to the technical field of data screening, and particularly relates to a server for data collection and screening and a policy data collection system. The server for data collection and screening comprises a database construction module for constructing a policy data collection database; a basic data collection module which collects basic data according to a policy data collection database; a screening module which is used for screening core data in the basic data according to a policy data screening model; and a policy data module which constructs policy data according to the core data, realizes collection and arrangement of policy data on the network, is convenient for a user to know various policies at the same time through once access, and saves time cost.

Application Domain

Web data indexingSpecial data processing applications +1

Technology Topic

Collection systemNetwork on +8

Image

  • Server for data collection and screening and policy data collection system
  • Server for data collection and screening and policy data collection system
  • Server for data collection and screening and policy data collection system

Examples

  • Experimental program(2)

Example Embodiment

[0035] Example 1
[0036] figure 1 It is the principle block diagram of the server for data collection and screening according to the present invention.
[0037] like figure 1 As shown, this embodiment 1 provides a server for data collection and screening, including: a database construction module, which builds a policy data collection database; a basic data collection module, which collects basic data according to the policy data collection database; a screening module, which screens according to the policy data The model selects core data from basic data; the policy data module constructs policy data based on core data, which realizes the collection and arrangement of various policy data on the network, so that users can understand various policies at the same time with one visit, saving time and cost.
[0038] In this embodiment, the database construction module is suitable for constructing a policy data collection database, that is, collecting URLs of websites that publish policy information, and storing each URL in the database to form a policy data collection database.
[0039] In this embodiment, the basic data collection module is adapted to collect basic data according to the policy data collection database, that is, use the crawling method of web crawler to obtain all the original data of each website from all the websites in the policy data collection database, and Filter the original data to obtain basic data; different crawling technologies can be used to deal with the anti-crawling strategies of different websites, such as Requests, Selenium, etc.; Use Beautiful soup, Selenium and other technologies to filter the original data Gini on the website and remove the original HTML tags, CSS styles, etc. in the data to obtain basic data, which is the data containing policies published on each website.
[0040] In this embodiment, the screening module is adapted to screen the core data in the basic data according to the policy data screening model, that is, dividing the basic data into paragraph sets, and identifying the keywords of each paragraph in the paragraph set according to the keywords of the policy types, And identify the core words in the keywords, that is, divide the paragraph into n words to form a word set C, and identify the keywords in the word set C; for the keyword C in the word set C i , calculate the keyword C i Number of co-occurrences with any other word in word set C; get keyword C i The contextual co-occurrence entropy of :
[0041]
[0042] Among them, H(C i ) is the keyword C for identifying each paragraph in the paragraph set according to the keyword of the policy category i The context co-occurrence entropy value of ; for other words C j and word C i After obtaining the contextual co-occurrence entropy values ​​of all keywords, compare the contextual co-occurrence entropy values ​​of all keywords, and the keyword with the largest contextual co-occurrence entropy value is the core word; If a keyword appears, the keyword is the core word; if there are multiple keywords with the largest contextual co-occurrence entropy in the paragraph, the paragraph has multiple core words. When judging the type of policy corresponding to the paragraph, the The paragraph is divided into multiple policy categories at the same time; the policy category corresponding to the core word is determined according to the keywords of the policy category, the content of the paragraph to which the core word belongs corresponds to the policy category, and then the policy category to which each paragraph belongs is determined.
[0043] In this embodiment, the policy data module is suitable for constructing policy data according to core data, that is, according to the policy category to which the paragraph belongs, each paragraph is divided into corresponding policy categories to construct policy data; under the directory of each policy category Some paragraphs with keywords corresponding to the policy category are collected from other websites, so that users can learn about various policies at the same time during one visit, which saves time and cost.
[0044] In this embodiment, the division of policy types and the extraction of policy type keywords can be set according to the policy direction to be collected; for example, when data about pension policies needs to be collected, the types of policies related to pension and related key words to accurately collect pension policies.

Example Embodiment

[0045] Example 2
[0046] figure 2 It is the principle block diagram of the policy data collection system involved in the present invention.
[0047] like figure 2 As shown, on the basis of Embodiment 1, Embodiment 2 further provides a policy data collection system, including: a server, which is suitable for collecting screening policy data; a host computer, which is suitable for receiving and displaying The server collects and screened policy data; the host computer can obtain the policy data collected and screened by the server through web page access.
[0048] In this embodiment, the server is suitable for using the server for data collection and screening in Embodiment 1.
[0049] To sum up, the present invention constructs a policy data collection database through a database construction module; a basic data collection module collects basic data according to the policy data collection database; a screening module screens core data from basic data according to a policy data screening model; policy data The module constructs policy data based on core data, and realizes the collection and arrangement of various policy data on the network, so that users can understand various policies at the same time with one visit, saving time and cost.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

SAR radar data encryption device based on SOC chip and method thereof

InactiveCN103809158ASimplify system complexitysave time and cost
Owner:INST OF MICROELECTRONICS CHINESE ACAD OF SCI

Sewing-device upgrading device

Owner:JACK SEWING MASCH CO LTD

Discrete transceiver circuit suitable for high-speed 1553 bus

ActiveCN102664782Asave time and costSave tape-out costs
Owner:58TH RES INST OF CETC

Classification and recommendation of technical efficacy words

  • Save time and cost

Preparation method and application of humanized gene modification animal model

ActiveCN107815468ASpeed ​​up the R&D processsave time and cost
Owner:BIOCYTOGEN PHARMACEUTICALS (BEIJING) CO LTD

Navigation data processing method, navigation data processing device and navigation terminal

ActiveCN104819726AReduce discovery cyclesave time and cost
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products