Word vector training method and device

A word vector and training corpus technology, applied in the Internet field, can solve problems such as difficulty in applying business scenarios, poor training effect, and low training rate

Active Publication Date: 2016-07-20
BEIJING SOGOU INFORMATION SERVICE
View PDF8 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This word vector training method uses a stand-alone mode, and its training rate is low, especially it is difficult to apply to business scenarios with a very large amount of data
In addition, this word vector training method is a universal training method, which does not consider the particularity of specific business scenarios, so the training effect is not good in specific business scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector training method and device
  • Word vector training method and device
  • Word vector training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0079] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the present invention Examples, not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0080] The present invention provides a word vector training method and device. The present invention analyzes the factors of the special application background of the word vector, and proposes a technical idea of ​​constructing a specific vocabulary according to user query logs, so that the word vector obtained through training can be quickly It is well applicable to the search ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word vector training method and device. The method comprises the following steps: an internet webpage is captured, and training language materials are acquired and stored in a corpus; each training language material in the corpus is subjected to word segmentation, and an orderly word set corresponding to each training language material is obtained; a word list is established according to pre-collected user query logs; the training language materials stored in the corpus are distributed to nodes of a distributed word vector learning model; the distributed word vector learning model is configured to perform periodic word vector training on each word in the word list, and the word vector corresponding to the word in the word list is obtained. According to the word vector training method and device, the word vectors obtained through training can be well applied to search business, and fast iterative high-quality word vector training can be realized.

Description

technical field [0001] The present invention relates to the technical field of the Internet, in particular to a word vector training method and device. Background technique [0002] In Internet applications, a very important issue is how to convert natural language into a data expression form that computers can understand. The most important step to solve this problem is to find a way to digitize natural language symbols. At present, the deep learning (Deep Learning, DL) method is commonly used. In DL, the "Distributed representation" distributed representation method is used, and each word is represented as a low-dimensional real number vector, which is the word vector corresponding to the word. Word vectors are born from this. It is understandable that word vectors are vectors used to express words in natural language, so that they are suitable for Internet applications. For example, word vectors can be used in many natural language learning processing (Natural Learning ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27G06F17/30
CPCG06F16/95G06F40/12G06F40/20
Inventor 邢宁刘明荣许静芳常晓夫王晓伟
Owner BEIJING SOGOU INFORMATION SERVICE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products