Method for identifying financial advertisements in text advertisements

An advertising and financial technology, applied in the field of advertising recognition, can solve the problem that the advertising analysis model cannot effectively identify financial advertisements, and achieve the effects of preventing over-fitting, improving accuracy, and good classification effects

Pending Publication Date: 2020-08-14
HARBIN INST OF TECH AT WEIHAI +1
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the technical problem that the existing advertisement analysis model cannot effectively identify financial advertisements, the present invention provides a financial advertisement judgme

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for identifying financial advertisements in text advertisements

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0022] Example 1: Such as figure 1 What is shown is a schematic diagram of the overall functional structure of this embodiment. The method for identifying financial advertisements in text advertisements disclosed in this embodiment includes the following steps:

[0023] (1) Obtain the crawled advertisement text data from the database; the advertisement text data mainly comes from search engines, Baidu Post Bar, financial portals, news portals and other sites.

[0024] (2) Preprocess the text data, perform word segmentation and remove useless information, so that the text can better represent semantic information. The data preprocessing mainly includes the following steps:

[0025] i. Word segmentation: In Chinese, a word is the smallest unit that constitutes a language, and it is the smallest unit with semantics, and a word cannot better represent the semantic information it carries. Therefore, it is necessary to convert the uninterrupted text data into continuous phrases;

[0026] i...

Example Embodiment

[0040] Example 2:

[0041] This embodiment takes the identification of financial advertisements in text advertisements in Baidu search engine as an example to describe technical solutions and steps. A method for identifying financial advertisements in text advertisements in Baidu search engine includes the following steps:

[0042] Step 1: Get 1000 advertisement texts of Baidu search engine that have been crawled from the database, and the ratio of training set to test set is 3:1;

[0043] Step 2: Use the jieba word segmentation tool to segment the text content of the training set:

[0044] jiaba word segmentation tool: is a python package for natural language processing, which can be downloaded and used directly through pip.

[0045] Step 3: Filter the phrase obtained after the word segmentation in step 2 through the stop vocabulary list published by the Harbin Engineering Natural Language Processing Laboratory, and remove the words in the stop word list. https: / / github.com / goto456 / st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for identifying financial advertisements in text advertisements. The method solves the technical problem that the existing advertisement analysis model cannot effectively identify the financial advertisements, and comprises the following steps: (1) obtaining crawled advertisement text data from a database; (2) preprocessing the text data in step (1), performing word segmentation, and removing useless information; (3) expressing the text preprocessed in step (2) in different modes as a mode which can be processed by a computer, namely text expression; (4) selecting an appropriate classification algorithm for different text representation modes in step (3), and then extracting semantic information represented by the text into category information; and (5) integrating the classification models represented by different text representation modes in step (4) to obtain a final financial advertisement recognition model. The method can be widely applied to occasions for identifying financial advertisements in text advertisements.

Description

technical field [0001] The invention relates to the field of advertisement identification, in particular to a method for identifying financial advertisements in text type advertisements. Background technique [0002] With the rapid development of the Internet, the Internet financial industry also presents a scene of prosperity. But at the same time of prosperity, there are also a lot of security problems, such as online fraud represented by "naked loans" and "campus loans", illegal fund-raising and other illegal and criminal activities, and these activities usually exist in financial advertisements. [0003] Nowadays, there are a large number of text advertisements in various websites, but these text advertisements are not only financial advertisements, but also ordinary advertisements. The text advertisements obtained through crawler technology include all advertisements, and when we analyze advertisements Only financial advertisements need to be analyzed, but the model we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F16/951G06F40/284G06K9/62
CPCG06F16/35G06F16/951G06F40/284G06F18/24323
Inventor 江颖硕施力张兆心唐积强吴震卢卫杨菁林董群郭长勇王伟
Owner HARBIN INST OF TECH AT WEIHAI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products