Entity semantic annotation method based on random walk

A random walk and semantic annotation technology, applied in the creation of semantic tools, special data processing applications, instruments, etc., can solve the problems that affect the recall rate and precision rate of returned results, and the difficulty of labeling methods, so as to improve the recall rate and the effect of precision

Active Publication Date: 2019-08-02
UNIV OF SHANGHAI FOR SCI & TECH
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004]The present invention is aimed at the sparse real data, the existing method is difficult to label comprehensively, which affects the recall rate and precision ra

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity semantic annotation method based on random walk
  • Entity semantic annotation method based on random walk
  • Entity semantic annotation method based on random walk

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Such as figure 1 The entity semantic labeling method based on random walk is shown, and the specific steps are as follows:

[0029] Step 1. In the offline module, based on the random walk algorithm, the steady-state probability matrix of the description text of the entity in the corpus is obtained, and the obtained steady-state probability matrix reflects the correlation score between entity terms. The specific implementation is as follows:

[0030] 1.1) Preprocess the description text of entities in the corpus (remove punctuation marks, case conversion, word segmentation, etc.). Number entities and preprocessed terms separately. The processed text data is uniformly expressed in the form of entity ID and term ID.

[0031] 1.2) TF-TDF is a commonly used weighting technique for information retrieval and data mining. Calculate the TF value and IDF value of the preprocessed data, and count the number of times a single term appears in the description text corresponding t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an entity semantic annotation method based on random walk. The method comprises the following steps: firstly, in an offline module, obtaining a steady-state probability matrixof a description text of an entity in a corpus based on a random walk algorithm, namely an entity term correlation score matrix; secondly, in an online query stage, giving the number of selected labeling terms, inputting the number into a user query entity node, finding a row vector corresponding to the query node by utilizing a steady-state probability matrix obtained in an offline module, and sorting all element values in the row vector according to the size to form a label recommendation list corresponding to the query; and finally, selecting the first k terms with relatively large entityterm correlation scores according to user requirements, and recommending the first k terms as feature words of the entity, i.e., semantic tags of the corresponding query entity. According to the method, a random walk model is introduced to solve the problem of data sparsity, semantic annotation can be conducted on an entity on a sparse data set, and compared with an existing keyword annotation method, the recall ratio and the precision ratio are effectively increased.

Description

technical field [0001] The invention relates to a data mining technology, in particular to an entity semantic labeling method based on random walk. Background technique [0002] Tags can effectively organize information on the Internet. Given an entity (document, image, video, etc.), the task of semantic annotation is to recommend several related tag information. There are already many solutions in the existing technology of labeling. Among them, collaborative filtering is the most widely used recommendation technology, but it has problems such as data sparsity and cold start, which directly lead to a great reduction in recommendation quality. In addition, there are keyword tagging methods, which aim to find a few representative words in the original text, but it cannot provide words that may be more representative but do not appear in the document. The tag recommendation of our method differs from keyword extraction in several respects, and the required tag may not appear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F16/36G06F17/27
CPCG06F16/3344G06F16/367G06F40/295
Inventor 张明西苏冠英李学民杨柳倩乐水波
Owner UNIV OF SHANGHAI FOR SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products