Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Internet key data information acquisition and extraction method

A key data and information collection technology, applied in digital data information retrieval, network data retrieval, electronic digital data processing, etc., can solve problems such as low value, too weak, and unable to target web page data mining

Active Publication Date: 2021-02-26
刘奕名
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the methods in the prior art use the comparison method, that is, for the data in the target webpage, perform matching with the preset keywords, and find the content that matches the keywords in the webpage. Such a method searches for the obtained information It is limited, and only through direct matching with each other, many other related information in the webpage will be missed, that is, it is impossible to really carry out effective data mining on the target webpage, so the value of finding the obtained data is not great
Traditional data acquisition methods are too weak in the face of unstructured and high-speed big data processing requirements, and it is necessary to innovate and develop data acquisition methods that meet the requirements of new big data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Internet key data information acquisition and extraction method
  • Internet key data information acquisition and extraction method
  • Internet key data information acquisition and extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0046] The present invention designs a method for collecting and extracting key Internet data information, which is used to obtain key text information in the target webpage. In practical applications, such as figure 1 As shown, the following steps A to H are specifically performed.

[0047] Step A. Carry out word segmentation processing for the text in the target webpage, and according to the preset meaningless thesaurus, remove the meaningless word strings and connecting word strings in the text after word segmentation processing, and update the text to the text to be processed , then go to step B.

[0048] Step B. Filter and obtain non-URL link strings in the text to be processed, and each word segmentation string that is different from each other, form each primary word segmentation string to be processed, and count...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an internet key data information acquisition and extraction method, which comprises the steps of introducing a newly designed data logic association relationship, taking a target webpage as a basic screening object, and realizing progressive logic combination of a high-frequency vocabulary search mode and a preset key information word search mode, so that on the one hand,information in a specified target direction is considered, and on the other hand, the method adapts to a big data updating direction, network data searching work becomes more comprehensive and objective, and key information in the basic screening object is obtained through comprehensive screening; and taking the basic screening object as a starting point, performing step-by-step one-to-one analysis on each webpage under direct reference, indirect reference and multi-level reference of the basic screening object to obtain each level of key information related to related topics and subjects in the basic screening object, and constructing a topological structure of the key information under the multi-level associated webpage, so that key information in the webpage is accurately, objectively and comprehensively screened, and the searching and mining efficiency of actual network data is improved.

Description

technical field [0001] The invention relates to a method for collecting and extracting key Internet data information, and belongs to the technical field of web page key information extraction. Background technique [0002] With the popularization of the Internet and various digital terminal devices, a world of Internet of Everything is taking shape, data is showing explosive exponential growth, and digitalization has become the basic force for building a modern society. And with the popularity of Internet of Things infrastructure, smart phones, and wearable devices, each of us is generating a large amount of data all the time, and a large amount of data is updated on the network every day, and it is full of various types of data. How to deal with massive data Performing quick analysis and interpreting the important information is the best use of network data at present. Most of the methods in the prior art use the comparison method, that is, for the data in the target webpa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/9535G06F16/958G06F16/903G06F40/216G06F40/284G06F40/289
CPCG06F16/90344G06F16/9535G06F16/958G06F40/216G06F40/284G06F40/289
Inventor 刘奕名
Owner 刘奕名
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products