Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Academic resource acquisition method based on LDA (latent Dirichlet allocation)

An acquisition method and academic technology, applied in the field of LDA-based academic resource acquisition, can solve the problem that traditional search technology is difficult to cover the different needs of mass users, achieve good topic matching effect, make up for time loss, and improve accuracy and quality Effect

Inactive Publication Date: 2017-05-31
NINGBO UNIV
View PDF3 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the rapid development of the Internet, the number of web pages has increased rapidly, but due to the limited computing resources, network tool resources and storage resources, traditional search technologies have been difficult to cover the different needs of mass users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Academic resource acquisition method based on LDA (latent Dirichlet allocation)
  • Academic resource acquisition method based on LDA (latent Dirichlet allocation)
  • Academic resource acquisition method based on LDA (latent Dirichlet allocation)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Specific embodiments of the present invention will be described in detail below.

[0061] A method for obtaining academic resources based on LDA. The academic resources are various electronic documents published on the Internet, including but not limited to various papers, periodicals, news, patent documents, using a subject crawler that can be run by a computer, and using LDA topic model that can be run by computer, LDA topic model such as image 3 As shown; configure a corpus for the LDA topic model, the corpus of the corpus is used for the training of the LDA topic model, and the topic document crawled by the topic crawler is obtained through the calculation of the LDA topic model, and the topic document is a collection of topic related words, such as Figure 4 Shown; The topic crawler further includes a topic determination module, a similarity calculation module, and a URL priority sorting module on the basis of a common web crawler, such as figure 2 As shown; in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an academic resource acquisition method based on LDA (latent Dirichlet allocation). According to the academic resource acquisition method, a topical crawler is used; an LDA topical model is also used; firstly, a training corpus is provided for the LDA topical model to train to obtain a topical document; the topical crawler further comprises a topical determination module, a similarity calculation module and a URL (Uniform Resource Locator) priority ranking module on the basis of a general network crawler; in the crawling process of the topical crawler, the topical document is adopted to guide calculation of topical similarity, a URL of which the topical similarity is greater than a set threshold is selected, the topical crawler maintains an URL queue of webpages which are not accessed, and the topical crawler sequentially and continuously accesses a webpage of each URL according to a ranking sequence of the URL queue, crawls corresponding academic resources, and continuously stores the crawled academic resources into a database after carrying out classification labeling until the URL in the queue of the webpages which are not accessed is empty; an API (Application Program Interface) of which an academic resource database is open is provided for display and calling; machine learning is fused into the academic resource acquisition method, and academic resource acquisition quality and efficiency are improved.

Description

technical field [0001] The invention relates to machine learning, information retrieval and web page data mining, in particular to an LDA-based academic resource acquisition method. Background technique [0002] With the electronicization of academic resources, it has gradually become a research hotspot to discover and excavate academic resources in the field of interest of researchers from massive academic resources. In order to adapt to the characteristics of massive, multi-source and heterogeneous digital academic resources, some new methods and models based on machine learning and data mining are constantly being used to find topics different from traditional keyword frequency-based topic discovery methods such as co-word analysis and citation analysis. Applied to the field of academic resource classification, such as the latent Dirichlet allocation model (latent Dirichlet allocation, LDA), social network analysis (SNA), etc., practice has found that this method has achi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 刘柏嵩费晨杰王洋洋尹丽玲高元
Owner NINGBO UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products