Webpage information autonomous searching and screening system within specified demand range

A technology for the scope of demand and web page information, which is applied in the direction of using information identifiers to retrieve web data, web data indexing, web data retrieval, etc. Inability to better serve research work and other issues

Pending Publication Date: 2021-04-02
荆门汇易佳信息科技有限公司
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] First, in the face of massive network resource information, it is increasingly difficult to meet the requirements of network information processing only by relying on traditional manual collection and processing methods. Various search engines in the prior art are aimed at information retrieval requirements in specific fields. General-purpose search engines have major deficiencies, the most prominent of which are as follows: First, this type of search engine is based on the full-text or keyword retrieval mechanism, which is prone to the phenomenon of more noisy information and less effective information, which makes the user's The search intention is submerged in the actual search results; second, the design rules of this type of web search engine pay more attention to the recall rate, which is applicable to a wider range of knowledge. Too many search results are returned, regardless of whether they will meet the user's professional knowledge background, etc., the quality of the search is not high; third, the efficiency and speed of web page information retrieval are low, and the real-time and effectiveness of the search results cannot be guaranteed;
[0009] Second, an important problem in the web page information collection and analysis system of the prior art is the screening of invalid data. In the massive data that cannot be ignored, the technical difficulty of quickly and autonomously grabbing information on the target web page and analyzing and judging the obtained information very big
[0010] Third, the existing general-purpose search tools have the problem of low intelligence when performing specialized searches, lack of highly targeted, specialized, and targeted information retrieval tools based on specified needs, and face the related technologies in The proportion of the field of network information search continues to expand, and the general-purpose search tools of the existing technology cannot better serve research work, and it is necessary to collect information that needs attention in a timely, accurate and efficient manner to establish a dynamic information service system in this industry;
[0011] Fourth, the existing technology independently collects and organizes information on designated webpages, and monitors a certain type of webpage or forum manually. Although the quality of the collected information is high, it can also better solve the problem of real-time monitoring. , but due to the large number of web pages of the same type on the Internet and frequent changes, it takes a lot of time and manpower to collect and organize such information web pages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage information autonomous searching and screening system within specified demand range
  • Webpage information autonomous searching and screening system within specified demand range
  • Webpage information autonomous searching and screening system within specified demand range

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0092] The technical solution of the system for autonomous collection and screening of webpage information in the specified scope of demand provided by the present invention will be further described in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention and be able to implement it.

[0093] The design of the self-collecting and screening system for web page information in the specified demand range of the present invention mainly includes: first, intelligent acquisition of web page data in the specified demand range; second, cleaning web page data and extracting text; third, extracting text features; fourth, saving web page data; fifth, Webpage data screening, the 6th, screening data output; The present invention adopts the webpage crawling mode of breadth priority, according to some key urls, expands and obtains it, and analyzes and extracts its text content in follow-up work, while extracting text content T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage information autonomous searching and screening system within the specified demand range. A solution is designed for autonomous searching and screening of the specifieddemand range on the Internet, and firstly, a system solution conforming to the monitoring service characteristics of the specified demand range of the Internet is planned for the working requirementsof autonomous searching and screening of webpage data; secondly, each key technology for autonomously searching and screening the network specified demand range information is researched, developed and realized, and some key technologies are improved and optimized, so that the system better meets the actual demand of the specified demand range monitoring service; thirdly, testing and performanceindex evaluation are carried out on the webpage information autonomous searching and screening system, the practical reliability of the system is verified through analysis of a test conclusion, an expected effect is achieved, and what is proved is that the webpage information autonomous searching and screening system has high practical value and has a good reference value for implementing networkmonitoring work in a specified demand range.

Description

[0001] technical scope [0002] The invention relates to a web page information self-collecting and screening system, in particular to a web page information self-collecting and screening system specifying a demand range, which belongs to the technical scope of web page collection and screening. Background technique [0003] Today, with the rapid development of the Internet, the World Wide Web has become a huge, globalized and widely distributed information transmission and service center. Many official or non-governmental organizations, groups, and even individuals have established various types of web pages on the Internet. The Internet is all-encompassing, involving politics, economy, entertainment, life, culture and other aspects, and the accumulated information capacity has grown exponentially exponentially. Collecting information from the Internet is not only an important way for people to acquire knowledge, but also the main method and means for portal news, industry in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/955
CPCG06F16/951G06F16/955
Inventor 刘秀萍
Owner 荆门汇易佳信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products