Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text information identification method and identification device

A text information and recognition method technology, applied in the field of information processing, can solve problems such as meaning and pinyin changes, neural network models that are difficult to accurately identify bad text information, etc.

Pending Publication Date: 2019-06-07
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the inventor found in the process of realizing the present invention that at least the following problems exist in the prior art: the existing machine learning models are usually recognized based on the text features and pinyin features of Chinese characters, but words often appear in bad text information. For similar words to avoid recognition, such as replacing "WeChat" with "Huixin", the meaning and pinyin of Wei and Hui have changed, making it difficult for the current neural network model to accurately identify bad text information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information identification method and identification device
  • Text information identification method and identification device
  • Text information identification method and identification device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] The technical solutions in the embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention.

[0064] The current methods for identifying bad text information can not only use machine learning models for identification, but also through keyword matching. However, there are still the following problems. For example, adding keywords usually involves a lot of manual operations, and the words are hurt by mistakes. , The generalization ability of traditional machine learning models is not strong. In order to avoid keywords, the black industry often changes the shape and sound of the characters. The maintenance cost of traditional recognition methods is very high. Therefore, how to reduce feature extraction and more accurately identify bad text information has become an urgent problem to be solved.

[0065] In view of this, the embodiment of the present invention provides a text information recognition me...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a text information identification method and identification device. The method comprises the steps of obtaining a to-be-processed UGC text; preprocessing the UGC text to obtain a stroke order feature vector and a pinyin feature vector of the UGC text; and inputting the stroke order feature vector and the pinyin feature vector into a pre-trained text recognition model to obtain a target type recognition result of the UGC text. According to the embodiment of the invention, the target type of the UGC text can be accurately identified, so that the bad UGC text can be identified more accurately.

Description

Technical field [0001] The present invention relates to the field of information processing technology, in particular to a text information recognition method and recognition device. Background technique [0002] With the rapid increase in the number of users in the video playback platform, UGC (User Generated Content) text generated by users has also rapidly increased at an explosive rate. For example, UGC texts such as barrage, comments, and live chat room content appearing in the video playback platform. Accompanying the massive UGC text is the bad text information, such as advertisements, abuse, pornography and other information. These text information tends to reduce the user experience and may cause the user's property loss. Therefore, it is necessary to deal with these in a large number of UGC texts. Bad text information. [0003] Current methods for identifying bad text information are generally based on machine learning models. Specifically, a machine learning model for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06N3/04
CPCY02D10/00
Inventor 唐颢诚都金涛
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products