Editing distance, word frequency and word vector based entity relation recognition method

A technology of edit distance and identification method, applied in natural language data processing, special data processing applications, network data retrieval, etc., can solve problems such as poor search experience

Active Publication Date: 2016-11-02
湖南中科优信科技有限公司
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In today's society, with the rapid development of science and technology, especially the development of Internet technology and the improvement of people's living standards, more and more people use the Internet, followed by the arrival of search engines, but the previous search experience is not Very good, especially for non-professionals and people who do not have many entertainment activities, what they search fo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Editing distance, word frequency and word vector based entity relation recognition method
  • Editing distance, word frequency and word vector based entity relation recognition method
  • Editing distance, word frequency and word vector based entity relation recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] See figure 1 , a kind of entity relation identification method based on word frequency editing distance of the present invention, its specific implementation steps:

[0038] Step 1: Extract text information from Baidu Encyclopedia such as: Zhongbo Media Co., Ltd.

[0039] Zhongbo Media Co., Ltd., usually referred to as Zhongbo Media, is the first venture capital fund in China that has successfully obtained the American International Data Group Technology Venture Capital Fund and New Media Fund (IDGVC, IDG NEW MEDIA), Dinghui Investment Mr. Wang Gongquan, Yunshi Investment, etc. Television Company.

[0040] Company name Zhongbo Media Co., Ltd. Established in 1999, referred to as Zhongbo Media Investment and distribution of "Hero" and "House of Flying Daggers" Achieved the only three times won the French Cannes Film Festival Grand Prix

[0041] Table of contents

[0042] 1 Introduction

[0043] 2 Brief history of development

[0044] 3 get honor

[0045] ProfileEdit ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an editing distance, word frequency and word vector based entity relation recognition method which comprises the following steps: 1) acquiring text data; 2) using word segmentation software to perform word segmentation and counting the number of each word in the text; 3) adjusting the dimensions of word vectors, window size and conducting word vector training; 4) using well-trained word vectors to converge words; and 5) calculating the editing distance of the entity obtained in the fourth step, and combining the word frequency obtained in the first step to obtain the alias or the abbreviation of a given entity; based on the editing distance and the step 4, calculating the d [i, j] value of the given entity word to other entities and combining step 1 to obtain the number of each entity word; and obtaining G (X) through weighted averaging, namely, obtaining first n possible abbreviations of a given entity word and finally deciding the alias or abbreviation of the given entity word through strength of the relationship by their weight.

Description

technical field [0001] The invention relates to an entity relationship recognition method based on edit distance, word frequency and word vector, which is applied to WEB data mining, entity recognition, search engines, etc., and belongs to the technical field of data mining. Background technique [0002] In today's society, with the rapid development of science and technology, especially the development of Internet technology and the improvement of people's living standards, more and more people use the Internet, followed by the arrival of search engines, but the previous search experience is not Very good, especially for non-professionals and people who do not have many entertainment activities, what they search for in search engines is often not what they want, but this kind of thing is widely circulated among the majority of groups. The problem we face is that the purpose of the search engine’s identification of entities and the establishment of relationships is to let th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/951G06F40/216G06F40/289
Inventor 段大高赵宁韩忠明
Owner 湖南中科优信科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products