English concept vector generation method and device based on Wikipedia link structure

A concept vector and concept technology, applied in the field of English concept vector generation based on the Wikipedia link structure, can solve the problem that the word vector method cannot distinguish the concept of the meaning of the word in essence, and achieve the effect of overcoming polysemy and accurate semantic representation

Active Publication Date: 2018-06-08
SHANDONG NORMAL UNIV
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the deficiencies in the prior art and solving the problem that the word vector method in the prior art cannot essentially distinguish the concept of word meaning, the present invention proposes a method and device for generating English concept vectors based on the Wikipedia link structure, which solves the problems of Wikipedia The construction of the link information base, the construction method of the concept vector training data set, and the design of the concept vector training model and training method, and the return method of the concept vector matrix

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • English concept vector generation method and device based on Wikipedia link structure
  • English concept vector generation method and device based on Wikipedia link structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] In order to be able to accurately learn the vector representation of word sense concepts, it is necessary to construct training data with concepts as objects. There are a large number of concept annotations in Wikipedia, and these concept annotations have rich semantic link relationships, which provides the possibility to construct training data for concept vectors.

[0065] The purpose of Embodiment 1 is to provide a method for generating English concept vectors based on the Wikipedia link structure.

[0066] In order to achieve the above object, the present invention adopts the following technical scheme:

[0067] Such as figure 1 as shown,

[0068] A method for generating English concept vectors based on Wikipedia link structure, the method comprising:

[0069] Step (1): constructing a link information base according to the title concept and / or link concept in the English Wikipedia page;

[0070] Step (2): According to whether there is a link concept in the sampl...

Embodiment 2

[0259] The purpose of Embodiment 2 is to provide a computer-readable storage medium.

[0260] In order to achieve the above object, the present invention adopts the following technical scheme:

[0261] A computer-readable storage medium, in which a plurality of instructions are stored, and the instructions are adapted to be loaded by a processor of a terminal device and perform the following processing:

[0262] Build a link information base based on the title concept and / or link concept in English Wikipedia pages;

[0263] According to whether there is a link concept in the sample in the link information base, construct training positive examples and training negative examples respectively, and select a certain number of training positive examples and training negative examples to establish a training data set;

[0264] Establish a concept vector model, which includes an input layer, an embedding layer, a concept vector operation layer and an output layer;

[0265] The conc...

Embodiment 3

[0267] The purpose of Embodiment 3 is to provide a terminal device.

[0268] In order to achieve the above object, the present invention adopts the following technical scheme:

[0269] A terminal device, including a processor and a computer-readable storage medium, the processor is used to implement instructions; the computer-readable storage medium is used to store multiple instructions, and the instructions are suitable for being loaded by the processor and performing the following processing:

[0270] Build a link information base based on the title concept and / or link concept in English Wikipedia pages;

[0271] According to whether there is a link concept in the sample in the link information base, construct training positive examples and training negative examples respectively, and select a certain number of training positive examples and training negative examples to establish a training data set;

[0272] Establish a concept vector model, which includes an input layer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an English concept vector generation method and device based on a Wikipedia link structure. The method comprises the steps that according to a title concept and/or a link concept in an English Wikipedia page, a link information base is established; whether or not the link concept exists in a sample in the link information base is judged, positive training examples and negative training examples are established separately, and a training data set is established by selecting a certain number of positive training examples and negative training examples; a concept vector model is established, wherein the model comprises an input layer, an embedding layer, a concept vector operation layer and an output layer; the concept vector model is trained by adopting the training data set, and a concept vector is extracted from the concept vector model.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, in particular to a method and device for generating English concept vectors based on Wikipedia link structure. Background technique [0002] Wikipedia, Wikipedia, is currently the largest encyclopedia. It is not only a huge corpus, but also a knowledge base that contains a large amount of human background knowledge and semantic relations. It is an ideal resource for natural language processing. [0003] Semantic representation of word concepts is a fundamental problem in the field of natural language processing. Traditional methods can be divided into count-based methods and prediction-based methods. In the former, the co-occurrence counts of word concepts are first counted, and the concept vectors of words are learned by decomposing the co-occurrence matrix; in the latter, the concept vectors of words are learned by predicting the co-occurrence words in a given context. The...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62G06N3/08
CPCG06N3/08G06F40/216G06F40/30G06F18/214
Inventor 薛若娟
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products