Automatic short text semantic concept expansion method and system based on open knowledge base

An open knowledge base and extension method technology, applied in the field of automatic extension of semantic concepts, can solve the problems of not being able to find concepts with similar semantics, irregularity, and large number of nodes in the graph

Active Publication Date: 2013-06-12
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 1) The terms used in Weibo are mostly random, irregular and noisy
[0008] 2) The limitation of the length of Weibo makes it naturally extremely sparse, making it difficult to extract effective content features
Reference 1 (P.Ferragina and U.Scaiella.Tagme: on-the-fly annotation of short text fragments (by wikipedia entities).In CIKM'10, 2010) designed an online short text that can be linked to semantically related The Wikipedia concept page system, which uses a fast and effective context-based voting mechanism for semantic disambiguation, has achieved relatively high accuracy in both short and long texts, but cannot obtain more semantically si

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic short text semantic concept expansion method and system based on open knowledge base
  • Automatic short text semantic concept expansion method and system based on open knowledge base
  • Automatic short text semantic concept expansion method and system based on open knowledge base

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings through specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0064] In order to better understand the present invention, some related background technical knowledge is briefly introduced first.

[0065] 1, n-gram (n-gram grammar)

[0066] The model is based on the assumption that the occurrence of the nth word is only related to the previous n-1 words, and not related to any other words. In the following, this model is used to extract all the segment information of the short text, which is equivalent to the word segmentation of the short text.

[0067] The set of n-grams generated for a certain string segment contains the elements gen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic short text semantic concept expansion method based on an open knowledge base, which comprises the steps that elements in n-gram sets generated by short texts are linked to concepts most relevant to the elements in the open knowledge base, and expandable semantic concept sets are generated for the elements based on a concept relationship matrix and the linked concepts of the open knowledge base. According to the method, only anchor text information in a document of the open knowledge base, rather than lexical item information and directory information of the document, is adopted to construct the concept relationship matrix, so that the construction and calculation of the matrix are convenient, and the problems of low granularity ratio of the directory information and many different meanings are solved. During a semantic concept expansion stage, a context based semantic similarity calculation method is adopted for semantic concept expansion, and context consistency of a short text content and similarity of the concepts in an abstract semantic layer are considered, so that the semantic concept expansion accuracy is improved.

Description

technical field [0001] The invention belongs to the field of Internet information search and data mining, and particularly relates to the automatic extension of semantic concepts with short texts of social media as the main content. Background technique [0002] In the field of information retrieval, semantic extension is recognized as one of the techniques that can effectively improve the recall rate of the system. The basic idea is to use words related to the query keywords to revise the query to find more relevant documents and improve the recall rate. However, the traditional query expansion method based on keywords often brings many errors in semantic understanding, such as Synonym problems, ambiguity problems, etc., it is difficult to ensure the precision rate while improving the recall rate. There are two fundamental reasons for this problem: First, in real life, there is a diversity of words used to describe the same object or event. For example, "thing" has at leas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 程学旗刘盛华肖永磊王元卓刘悦
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products