Headline clickbait identification method and device, server and storage medium

A recognition method and title party technology, applied in the Internet field, can solve the problems of poor generalization ability, low recognition accuracy, large accidental injury, etc., and achieve the effect of high accuracy and high recall.

Inactive Publication Date: 2017-12-19
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide a headline party identification method and device, server, and storage medium to solve the problems of large false positives, poor generalization ability, and low recognition accuracy in existing title party identification methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Headline clickbait identification method and device, server and storage medium
  • Headline clickbait identification method and device, server and storage medium
  • Headline clickbait identification method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] figure 1 The flow chart of the headline party identification method provided by Embodiment 1 of the present invention, this embodiment is applicable to the situation where the title party needs to be identified, and the method can be executed by a title party identification device, which can use software and / or implemented in hardware. Such as figure 1 As shown, the method specifically includes:

[0026] Step 110, extract the text statistical features and semantic features of the title.

[0027] Title party is a type of title with click bait. This type of title usually uses some prominent text features such as exaggeration, phrases or short sentences that have a large gap with reality to attract readers’ attention. In addition, this type of title also has its unique semantic features. Therefore, we can use the textual features, semantic features or a combination of the two to judge whether the title is a title party.

[0028] In this embodiment, in order to accurat...

Embodiment 2

[0045]This embodiment provides a preferred implementation of step 110 on the basis of embodiment one. The text statistical features in embodiment one include the number of punctuation marks, the number of stop words, the number of regional words, and the number of lure words At least one of the number of pronouns, the number of pronouns, or the number of lure segments. In this embodiment, only the number of lure segments in text statistical features is used as an example for illustration. figure 2 It is a flow chart of the title party identification method provided by Embodiment 2 of the present invention, such as figure 2 As shown, the method includes:

[0046] Step 210: Segment the title according to the punctuation marks in the title to obtain at least one segmented phrase.

[0047] The title usually contains punctuation marks. In this embodiment, the title can be divided into at least one short sentence by using the punctuation marks in the title. Exemplarily, the titl...

Embodiment 3

[0071] image 3 It is a structural schematic diagram of the headline party identification device in the third embodiment of the present invention. Such as image 3 As shown, the title party identification devices include:

[0072] The feature extraction module 310 is used to extract the text statistical features and semantic features of the title, wherein the text statistical features may preferably include the number of punctuation marks, the number of stop words, the number of regional words, the number of lure words, and the number of pronouns Or at least one of the number of lure fragments.

[0073] The decision-making scoring module 320 is configured to use a pre-trained decision-making model, take text statistical features and semantic features as input to the decision-making model, and output the decision-making score of the title.

[0074] The score comparison module 330 is configured to compare the decision score with a first preset threshold, and determine whether...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a headline clickbait identification method and device, a server and a storage medium. The headline clickbait identification method comprises the steps of extracting text statistic characteristics and semantic characteristics of a headline, utilizing a pre-trained decision-making model, making the extracted text statistic characteristics and semantic characteristics serve as input of the decision-making model, outputting a decision-making score of the headline, comparing the decision-making score with a first preset threshold value, and determining whether or not the headline is a headline clickbait according to a comparison result. According to the headline clickbait identification method and device, the server and the storage medium, the text statistic characteristics and semantic characteristics of the headline are extracted in a multi-level, multi-granularity and multi-angle mode, the decision-making model is utilized to conduct decision-making scoring on the text statistic characteristics and semantic characteristics, and whether or not the headline is the headline clickbait is determined at last, so that the problems are solved that existing headline clickbait identification methods are large in error, poor in generalization capability and not high in identification accuracy, and the headline clickbait identification method and device have the advantages of being high in accuracy and recall rate.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of the Internet, and in particular to a method and device for identifying headline party members, a server, and a storage medium. Background technique [0002] With the development of the Internet, many online news media (content producers, including professional media, self-media, etc.) have emerged on Internet platforms. The income of such news media is directly proportional to the number of clicks on the content generated by readers. Therefore, in order to obtain high hits, competitive advantages, influence and high profits, such news media often make a fuss about the titles of the produced content, and produce titles that are completely inconsistent with the content in order to attract readers’ attention. This kind of title is title with click bait (clickbaits), which is commonly known as title party. [0003] The existing headline party identification methods are mainly based ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30G06N3/08
CPCG06F16/35G06F40/289G06F40/30G06N3/08
Inventor 朱曼瑜董大祥李大任
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products