Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for collecting and extracting key Internet data information

A key data and information collection technology, applied in the direction of digital data information retrieval, network data retrieval, other database retrieval, etc., can solve the problems of missing other information, finding the obtained data is of little value, and unable to target the effective data mining of the webpage, etc. , to achieve efficient working methods and improve work efficiency

Active Publication Date: 2021-06-15
刘奕名
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And with the popularity of Internet of Things infrastructure, smart phones, and wearable devices, each of us is generating a large amount of data all the time, and a large amount of data is updated on the network every day, and it is full of various types of data. How to deal with massive data Performing quick analysis and interpreting the important information is the best use of network data at present. Most of the methods in the prior art use the comparison method, that is, to match the data in the target web page with the preset keywords. Find the content that matches the keyword in the webpage. The information obtained by such a method is limited, and only through direct matching with each other, many other related information in the webpage will be missed, that is, the target webpage cannot be truly analyzed. Effective data mining, so finding the data obtained is not of much value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for collecting and extracting key Internet data information
  • A method for collecting and extracting key Internet data information
  • A method for collecting and extracting key Internet data information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0046] The present invention designs a method for collecting and extracting key Internet data information, which is used to obtain key text information in the target webpage. In practical applications, such as figure 1 As shown, the following steps A to H are specifically performed.

[0047] Step A. Carry out word segmentation processing for the text in the target webpage, and according to the preset meaningless thesaurus, remove the meaningless word strings and connecting word strings in the text after word segmentation processing, and update the text to the text to be processed , then go to step B.

[0048] Step B. Filter and obtain non-URL link strings in the text to be processed, and each word segmentation string that is different from each other, form each primary word segmentation string to be processed, and count...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for collecting and extracting key data information from the Internet, which introduces a newly designed data logic association relationship, takes the target web page as the basic screening object, and realizes progressive logic for the high-frequency vocabulary search method and the preset key information word search method Combining, on the one hand, it considers the information of the specified target direction, and on the other hand, it adapts to the direction of big data update, making the network data search work more comprehensive and objective, and the key information in the basic screening objects can be obtained through comprehensive screening; Based on the screening object as the starting point, each web page under its direct reference, indirect reference and multi-level reference is analyzed step by step to obtain key information at all levels related to the relevant topics and themes in the basic screening object, and construct a multi-level Associate the topological structure of the key information under the webpage, accurately, objectively and comprehensively screen the key information in the webpage, and improve the search and mining efficiency of actual network data.

Description

technical field [0001] The invention relates to a method for collecting and extracting key Internet data information, and belongs to the technical field of web page key information extraction. Background technique [0002] With the popularization of the Internet and various digital terminal devices, a world of Internet of Everything is taking shape, data is showing explosive exponential growth, and digitalization has become the basic force for building a modern society. And with the popularity of Internet of Things infrastructure, smart phones, and wearable devices, each of us is generating a large amount of data all the time, and a large amount of data is updated on the network every day, and it is full of various types of data. How to deal with massive data Performing quick analysis and interpreting the important information is the best use of network data at present. Most of the methods in the prior art use the comparison method, that is, to match the data in the target w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9535G06F16/958G06F16/903G06F40/216G06F40/284G06F40/289
CPCG06F16/90344G06F16/9535G06F16/958G06F40/216G06F40/284G06F40/289
Inventor 刘奕名
Owner 刘奕名
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products