Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Text content sorting method based on mobile internet access

A technology of mobile Internet and classification method, which is applied in the field of text content classification of massive data processing, and can solve problems such as updating

Active Publication Date: 2014-07-02
北京中鼎易信科技有限公司
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the above existing problems, the object of the present invention is to provide a text content classification method, system and device based on mobile Internet access constructed by an artificial intelligence expert system, aiming at solving how to deal with hundreds of "Garbage" cleaning of billion-level mass access content (page URL), how to accurately and efficiently classify "effective" content, how to update data cleaning knowledge and content classification knowledge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text content sorting method based on mobile internet access
  • Text content sorting method based on mobile internet access
  • Text content sorting method based on mobile internet access

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0180] The present invention is the text content classification of the mobile Internet access of the distributed massive big data based on Hadoop architecture under cloud computing, and will be described in detail below in conjunction with the accompanying drawings:

[0181] exist figure 1 In , the text content classification process based on mobile Internet customer behavior is as follows:

[0182] Data source description : "Mobile Internet Access Records" 102 comes from the operator's daily DPI mobile Internet optical data. As a provincial-level telecom operator, the mobile Internet access records range from hundreds of millions of access records to billions or even tens of billions . The space occupied by an access record depends on the number of fields contained in a record. Generally, about 5TB of hard disk space is required for a scale of 10 billion access records.

[0183] System Architecture Description : Every day, on the cloud computing platform, based on the H...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text content sorting method based on mobile internet access and belongs to the field of massive big data processing and content sorting. The text content sorting method adopts a method of an artificial intelligent specializing system and includes steps of firstly establishing a URL (uniform resource locator) cleaning knowledge base, and filtering contents which are not finally browsed by accessors, namely 'garbage', by a cleaning inference engine; secondly sorting the 'effective' content accessed via the mobile internet according to a URL sorting knowledge base, keyword sorting knowledge base and related inference engines. By updating the URL cleaning knowledge base, the URL content sorting knowledge base and the keyword content sorting knowledge base, the system becomes smarter, content sorting efficiency is improved, and more importantly, coverage and accuracy of content sorting are improved.

Description

technical field [0001] The invention belongs to the field of massive big data processing and content classification, and in particular relates to a text content classification method based on mobile Internet access and massive data processing of tens of billions of visits. technical background [0002] At present, as a provincial-level telecom operator is in the process of business transformation from "traffic management" to "traffic management", the number of page URLs accessed by its users on the mobile Internet every day ranges from hundreds of millions to billions , or even tens of billions, involving hundreds of thousands of websites, and its text content is ever-changing. Therefore, how to accurately and efficiently classify the text content accessed by users, so as to analyze the user's access behavior and accurately characterize the characteristics of customers' access interests, is the core issue of intelligent marketing that the three operators urgently need to sol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/955
Inventor 孙宏赵晓波季海东董童霖赵宇龙
Owner 北京中鼎易信科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products