Literature data processing method and system based on web crawler technology, and medium

CN117171650BActive Publication Date: 2026-06-26INFORMATION RES INST OF SHANDONG ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INFORMATION RES INST OF SHANDONG ACAD OF SCI
Filing Date
2023-09-21
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The statistical processing of scientific research data in universities is extremely time-consuming and labor-intensive, and prone to errors. Existing technologies have poor classification effects and low analysis efficiency for scientific research data.

Method used

Web crawling technology is used to capture web page data from target research websites, and the data is then imported into a decision tree-based classification model for classification. Combined with semantic analysis and extraction of form requirement information, retrieval requirement data is generated.

Benefits of technology

It enables efficient and accurate classification and analysis of scientific research data, reduces the time and effort costs of manual labor, improves the accuracy and applicability of scientific research data mining, and supports rapid retrieval of multidisciplinary journals.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117171650B_ABST
    Figure CN117171650B_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on network crawler technology's literature data processing method, system and medium, based on network crawler technology and catch the webpage data of target scientific research website;The webpage data is imported into the classification model based on decision tree and classified, and the data after classification is obtained;Scientific research content retrieval requirement information is acquired, based on the requirement information carries out semantic analysis and form requirement information extraction, and obtains retrieval form information;Based on retrieval form information, the data after classification is carried out data retrieval and data integration, and generates retrieval requirement data;The target scientific research website includes literature, scientific research, periodical website.By the application, the text word can be efficiently and accurately analyzed, the high-precision data mining and efficient classification of paper data, scientific research data and journal data are realized, and the time and effort cost of manual scientific research work is greatly reduced.
Need to check novelty before this filing date? Find Prior Art