Internet behavior markup engine and behavior markup method corresponding to same

An Internet and behavioral technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of video, audio and other data powerlessness, unable to reflect differences in Internet behavior, and low classification accuracy.

Active Publication Date: 2013-06-05
北京宽连十方数字技术有限公司
View PDF4 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 2. Difficulties in adding new classifications: each classification requires a large number of training set files for training, and obtaining this type of thesaurus
[0007] 3. It is difficult to change the classification system: if the classification system is adjusted, the lexicon needs to be retrained
[0008] 4. The accuracy of the classification results is related to the length of the text to be classified. When the length of the text to be classified is less than the threshold (such as shorter than 100 characters), the separation accuracy is greatly reduced
[0009] 5. It can only handle text classification in the Internet, but can't do anything about video, audio and other data
[0010] 6. Various algorithms used in text classification technology, the classification accuracy rate based on text similarity is generally lower than 90%
[0011] 7. The text classification system cannot fully describe the behavior, and the static text classification system is part of the user behavior markup language
Labeling methods based on text content cannot reflect the differences in online behavior of users targeting the same content
[0012] 8. The text classification system is difficult to meet the description needs of "individual users as the core" mobile Internet user behavior analysis
The current text classification system is mostly used in website analysis. In website analysis, the text classification system is used as the independent variable of description, and the frequency of user group behavior is used as the dependent variable; and user behavior analysis, especially mobile Internet user behavior analysis often needs to use Individual users are used as independent variables, and it is difficult to provide accurate and multiple combination description dimensions when the text classification system becomes the dependent variable
In fact, whether it is website analysis or user analysis, the pair of {user, text}, {website, user}, {website, text} has been difficult to meet the analysis needs in practice

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Internet behavior markup engine and behavior markup method corresponding to same
  • Internet behavior markup engine and behavior markup method corresponding to same
  • Internet behavior markup engine and behavior markup method corresponding to same

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0100] Data preparation: URL strings accessed by users

[0101] http: / / read.10086.cn / booksort?nodeId=6893664

[0102] Process:

[0103] Step1: Obtain the matching rules in the site according to the top-level domain name (10086.cn).

[0104] Step2: Perform matching according to the domain name prefix part (read), and obtain the rule object that can match the URL: {"id":2627,"topDomain":"10086.cn","ars":{"id":28 },"domainReg":{"^read$"},"genre":"ts_","matchingType":1,"prod":{"id":1376,"prodMod":"546","prodType" :"702"},"resIdType":0,"resIdVal":"","resIdVarReg":null,"resIdVarSet":null,"resIdVarValReg":null,"userAct":"104003","validCode":" 901",…}

[0105] Step3: For non-content objects, the process ends. The result data is: {"ars":{"id":28,"name":"China Mobile Communications Co., Ltd."},"prod":{"id":1376,"prodMod":"546","prodName ":"Mobile Reading","prodType":"702"},"resObj":null,"topDomain":"10086.cn","userAct":"104003","validityCode":"901"}

example 2

[0107] Data preparation: URL strings accessed by users

[0108] http: / / read.10086.cn / www / readView?bid=377517448&cid=377517451

[0109] Process:

[0110] Step1: According to the top-level domain name (10086.cn), obtain the matching rules in the site, that is, the second-level collection.

[0111] Step2: Perform matching according to the domain name prefix part (read), and obtain the rule object that can match the URL: {"id":2627,"topDomain":"10086.cn","ars":{"id":28 },"domainReg":{"^read$"},"genre":"ts_","matchingType":1,"prod":{"id":1376,"prodMod":"546","prodType" :"702"},"resIdType":11,"resIdVal":"","resIdVarReg":null,"resIdVarSet":["bid"],"resIdVarValReg":null,"userAct":"104003"," validCode":"901",...}

[0112] Step3: It is the content object, read the content object information: bid=377517448

[0113] Step4: Classification results: {"ars":{"id":28,"name":"China Mobile Communications Co., Ltd."},"kwTaskIds":[],"prod":{"id":1376,"topDomain ":"10086.cn","prodMod":"546",...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an internet behavior markup engine and a behavior markup method corresponding to the same, and belongs to the technical field of user internet behavior data collection and analysis. The markup engine comprises a classification system module, a word segmentation base module, a semantic analysis module, a crawling program module, a rule base module, a knowledge base module, a rule parser module and a self-learning program module. The internet behavior markup method provides a basic logical structure that user behavior=behavior agent + behavior identification + behavior state. By the engine and the method, classification efficiency and accuracy are improved, description particle size of internet user behavior data is thinned, action, object and environmental conditions of one-time user behaviors are integrally recognized, and internet user behaviors are restored integrally. User behavior data outputted according to IUBML (internet universal behavior markup language) rules directly provide accurate advertising services based on user behaviors and demand understanding, and marketing requirements of corporate clients are met.

Description

technical field [0001] The invention relates to a collection and analysis technology of user Internet behavior data, and specifically discloses an Internet behavior tagging engine and a behavior tagging method corresponding to the engine. Background technique [0002] For a long time, the biggest problem that has plagued enterprises is "how to understand his customers better". On the Internet, any behavior has precursors. To buy products, you must first browse, compare, and inquire; to engage in activities, you must first collect, discuss, and plan; through the collection and analysis of user Internet behavior data, enterprises have the ability to predict the future behavior of customers in the physical world. [0003] Internet user behavior big data mining must have the ability to manage different data types and data structures. Variety is one of the basic characteristics of Big Data. Big data is usually a mixture of structured data, semi-structured and unstructured data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 唐波李骄阳张祺薛忠军高福强褚秀良庞岩
Owner 北京宽连十方数字技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products