Supercharge Your Innovation With Domain-Expert AI Agents!

A method for extracting minority theme data in new media environment

A minority and new media technology, applied in unstructured text data retrieval, text database browsing/visualization, semantic tool creation, etc., can solve problems such as efficiency bottlenecks, rare etymology, strong professionalism, etc.

Active Publication Date: 2019-01-18
YUNNAN UNIV
View PDF11 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In order to overcome the efficiency bottlenecks caused by rare etymology, strong professionalism, and heterogeneous words in the field of ethnic minorities, the present invention provides a method for obtaining data from new media platforms and extracting ethnic minority theme data based on the LDA model and KG

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for extracting minority theme data in new media environment
  • A method for extracting minority theme data in new media environment
  • A method for extracting minority theme data in new media environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0050] Embodiment: An example of extracting Tibetan data from "Sina Weibo".

[0051] Step 1: Preprocessing

[0052] Firstly, the microblog data is obtained from the "Sina Weibo" platform, and the single microblog data is shown in Table 1.

[0053] Table 1 Weibo data example

[0054]

[0055] For the convenience of description, additional information items will be included in the description of the following data extraction A i Hidden, so the obtained Sina Weibo data contains 5 Weibo data a 1~ a 5, as shown in Table 2.

[0056] Table 2 Sina Weibo Data

[0057]

[0058] Then, for the text part of Weibo data T i Perform word segmentation processing, select word segmentation tools, support custom dictionaries and stop words, and introduce Tibetan domain knowledge Z ={, , , , , }, add vocabulary in the Tibetan field to the word segmentation tool dictionary, and record the word segmentation results as Seg_T i ,as shown in Table 3.

[0059] Table 3 Segmentation res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for obtaining data from a new media platform and extracting thematic data of ethnic minorities is disclosed. According to the mass, unstructured and multi-topic characteristics of new media data, LDA model is used to extract features, analyze topics and mine hidden themes from preprocessed new media data. Then KG is constructed by using the domain knowledge of minority nationalities, andthe domain KG is used to guide the extraction of thematic data of ethnic minorities. In the process of extracting LDA model and KG guiding data, the invention sets parameters according to different data scales, thereby optimizing the algorithm and realizing accurate, efficient and expandable new media data extraction.

Description

technical field [0001] The invention discloses a method for acquiring data from a new media platform and extracting minority theme data. It involves a method of latent topic analysis and feature extraction based on latent Dirichlet Allocation (LDA) new media data, and using domain knowledge graph (KG) to realize the extraction of minority topic data. It belongs to the field of data processing and knowledge discovery. Background technique [0002] New media is a new media form relative to traditional media such as newspapers, radio, and television, including network media, mobile media, and digital TV. It is interactive and instant, massive and shareable, multimedia and hypertext, Features such as personalization and socialization. As new media play an increasingly important role in information dissemination, the processing and analysis of network media data has also attracted great attention from scholars at home and abroad. The data is divided according to the difference...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/34G06F16/36
Inventor 岳昆麻友李维华王笑一郭建斌
Owner YUNNAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More