Supercharge Your Innovation With Domain-Expert AI Agents!

Method for performing approximate search based on word vectors to quickly extract advertisement text themes

A technology of word vectors and advertisements, which is applied in the field of extracting advertisement text topics, can solve the problems of slow extraction of advertisement text topics, and achieve the effects of high extraction accuracy, convenient operation, and improved extraction speed

Active Publication Date: 2020-01-21
上海开域信息科技有限公司
View PDF8 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] Aiming at the deficiencies in the background technology, the present invention designs a method for quickly extracting the subject of advertisement text by performing approximate search based on word vectors, the purpose of which is to solve the problem of slow extraction of subject of advertisement text in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for performing approximate search based on word vectors to quickly extract advertisement text themes
  • Method for performing approximate search based on word vectors to quickly extract advertisement text themes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] like figure 1 As shown, the present invention discloses a method for quickly extracting the subject of advertisement text based on word vector approximate search, including the following steps: the first step is to use the stuttering word segmentation tool and the existing stop thesaurus to search for the subject of the advertisement text Remove the same word as the stop word in the ad title, extract the Chinese words in the corpus and use it as a dictionary, use the dictionary, scan the word map based on the prefix tree, and generate all possible Chinese characters in the sentence A directed acyclic graph (DAG) composed of words, through dynamic programming to find the maximum probability path, find out the maximum segmentation combination based on word frequency, and segment the advertising text theme;

[0026] In the second step, according to the word vectors in the corpus, a random projection algorithm is used to establish a word vector index;

[0027] The third st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for carrying out approximate search based on word vectors to rapidly extract advertisement text themes. The method comprises the following steps of: step 1, searchingwords which are the same as words in a stop word bank from an advertisement title by using a vocabulary segmentation tool and the existing stop word bank, removing the words, namely, stop words in theadvertisement title, extracting Chinese words in a corpus, taking the Chinese words as a dictionary, and performing word segmentation on an advertisement text theme by using the dictionary. The method is convenient to operate,. The search complexity of a single query word in the GPU-DMM generation model can be reduced from 0 (N) to 0 (log N); the whole advertisement text topic extraction processis accelerated, the extraction speed is greatly increased, offline processing and unsupervised training can be completed in several hours in the whole process, the requirements for large-scale data volume and near real-time performance of the internet advertisement industry can be met, and user interest tags can be updated day by day or hour by hour.

Description

technical field [0001] The invention relates to a method for extracting an advertisement text theme, in particular to a method for quickly extracting an advertisement text theme by performing approximate search based on a word vector. Background technique [0002] In the Internet advertisement recommendation business, firstly, the subject of the advertisement text is extracted according to the advertisement text clicked or browsed by the user, and then the user’s interest tag is determined. Common ones are LDA and GPU-DMM. [0003] LDA is a document topic generation model, which contains a three-layer structure of words, topics and documents. The so-called generative model means that we believe that each word in an article is obtained through the process of "selecting a certain topic with a certain probability, and selecting a certain word from this topic with a certain probability". Documents to topics obey a multinomial distribution, and topics to words obey a multinomia...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/247G06F40/242G06F16/31G06F16/33G06K9/62G06Q30/02
CPCG06F16/322G06F16/33G06Q30/0277G06F18/22Y02D10/00
Inventor 李新李征宇邵品贤吴小刚
Owner 上海开域信息科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More