A model training method and text representation method for academic heterogeneous network embedding

A heterogeneous network and model training technology, applied in the field of text representation, can solve the problems of affecting the representation accuracy of academic texts, losing the relationship information of academic papers, and not considering the semantic features of texts, etc., to achieve the effect of improving the representation effect

Active Publication Date: 2022-03-01
HANGZHOU DIANZI UNIV
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method does not consider the content semantic features of the text when representing the text. It is only a representation of the academic network structure, that is, the relationship information between academic papers, which affects the accuracy of academic text representation.
The second is to represent academic papers based on the text content representation method, such as using the model to directly vectorize the text content of the paper. This method does not consider the relationship between a large number of useful academic papers in the academic heterogeneous network , although some academic papers use different vocabulary, they are actually highly related to each other
In this case, the representation method based on text content loses the relationship information among academic papers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A model training method and text representation method for academic heterogeneous network embedding
  • A model training method and text representation method for academic heterogeneous network embedding
  • A model training method and text representation method for academic heterogeneous network embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0042] Embodiment: This embodiment provides a model training method for academic heterogeneous network embedding, the flow chart of which is as follows figure 1 shown, and a schematic diagram for further illustration is shown in figure 2 As shown, first step S1: obtain a large number of academic papers, and then use the academic papers to perform step S2: construct an academic heterogeneous network.

[0043] In this embodiment, the process of constructing an academic network may be as follows: First, only retain authors with less than 100 document links and at least one link. Reduce author name ambiguity by discarding authors associated with tens of thousands of documents. Then the text content of the paper is composed by concatenating the title and the abstract and keeping only the content whose string length is greater than 50, and then these papers and authors and the paper fields are linked according to their mutual relations.

[0044] A schematic diagram of an academic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a model training method for embedding an academic heterogeneous network. Firstly, papers are used to generate an academic heterogeneous network. The academic heterogeneous network includes paper nodes, multiple paper feature nodes, edges and text content; multiple paper nodes are selected as queries Nodes, walk the academic heterogeneous network with meta-paths composed of different paper features, and generate close relationship node sets for each query node under various paper feature query conditions; for each query node, close relationship node set and academic The heterogeneous network is sampled to obtain multiple triplet data representing the relationship between the query node and other nodes; the language representation model is trained according to multiple triplet data, so that the model embeds the structural relationship information of the academic heterogeneous network into the text representation vector. The model trained by the invention can simultaneously embed the text semantics and the structural semantics of the academic heterogeneous network into the text representation vector in the academic field to improve the representation effect.

Description

technical field [0001] The invention belongs to the technical field of text representation, and in particular relates to a model training method embedded in an academic heterogeneous network and a text representation method. Background technique [0002] Text representation in the academic field is an important basis for accurate and efficient scientific literature search, academic expert search, academic community discovery and other services. [0003] Academic papers form a variety of rich associations through intermediate entities such as authors, topics, fields, conferences, and journals, such as cooperative relationships, citation relationships, and so on. These academic texts, intermediate entities and their relationships constitute an academic heterogeneous network of papers, such as the internationally renowned DBLP academic network. [0004] The existing research work on academic text representation can be divided into two categories: the first is to use random wal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06F16/383G06F40/30
CPCG06F16/313G06F16/383G06F40/30
Inventor 徐小良刘俊
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products