Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web crawler-based academic institution geographic position information extraction method

A technology of geographic location information and web crawler, applied in the field of geographic location information extraction of academic institutions based on web crawler, can solve problems such as affecting statistical results, and achieve the effect of balancing the accuracy rate and the recall rate

Pending Publication Date: 2020-12-15
SHANGHAI JIAO TONG UNIV
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, there is less information about academic institutions, especially with the development of internationalization, it is difficult to know the country or even the city of the academic institution through the name of the author, which seriously affects many important statistical results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler-based academic institution geographic position information extraction method
  • Web crawler-based academic institution geographic position information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The following describes the preferred embodiments of the present invention with reference to the accompanying drawings to make the technical content clearer and easier to understand. The present invention can be embodied in many different forms of embodiments, and the protection scope of the present invention is not limited to the embodiments mentioned herein.

[0034] Such as figure 1 As shown, the framework for extracting geographical location information of academic institutions based on web crawlers includes the following steps:

[0035] Step 1, the search engine searches for the name of the academic institution;

[0036] Step 2, obtain the official website of the academic institution and the Wikipedia page;

[0037] Step 3, analyze the domain name of the official website;

[0038] Step 4, parsing the Wikipedia page;

[0039] Step 5, place name dictionary query.

[0040] Among them, in step 1, crawlers are used to obtain Google search api and Wikipedia search a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web crawler-based academic institution geographic position information extraction method, which relates to the field of data mining and comprises the following steps of searching an academic institution name by a search engine; obtaining an academic institution official website and a Wikipedia page; analyzing the official website domain name; analyzing the Wikipedia page;and carrying out geographical name dictionary query. According to the method, entity relationships of organization geographic positions in mass data can be rapidly extracted by adopting template rules, meanwhile, relatively balanced accuracy and recall rate are kept, and accurate and effective academic institution statistical data is further provided.

Description

technical field [0001] The invention relates to the field of data mining, in particular to a method for extracting geographic location information of academic institutions based on web crawlers. Background technique [0002] In recent years, more and more academic network-based applications and products have emerged. Academic institutions are one of the important entities in the academic network, appearing together with the authors of papers, from which the strength of academic institutions, cooperative relations, and the comparison of academic strength between countries and regions can be analyzed. The relationship between authors and articles can be established from massive paper data, but it does not include the hierarchical structure, geographical location, creation time and other information of the academic institution itself, which brings trouble to many statistical work. [0003] The current method of building an academic network is mainly by synthesizing the data of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/9537G06F40/289G06F40/295
CPCG06F16/951G06F16/9537G06F40/289G06F40/295
Inventor 沈雪乔陈贵海
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products