Method for extracting conceptual words from video subtitles

A technology of concept words and subtitles, which is applied in the fields of electronic digital data processing, digital data information retrieval, special data processing applications, etc. It can solve the problems of difficult performance, difficulty in concept word extraction tasks, equal signs between keywords and concept words, etc. problems, achieve high prediction accuracy, good performance, and reduce workload

Active Publication Date: 2019-08-27
SHANDONG UNIV OF SCI & TECH
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, although there are many methods to extract concept words from text, such as supervised and unsupervised methods based on various machine learning algorithms such as support vector machines and neural networks, supervised methods require a large amount of artificially labeled corpus, while unsupervised methods There is no need to manually annotate the corpus, but it is difficult to achieve satisfactory performance, and these methods are aimed at extracting keywords from general text mining scenarios. If they are directly applied to course video subtitle texts, they will usually not achieve satisfactory results. performance, this is because video subtitles are different from general text mining scenarios, such as academic papers and news texts, and keywords and concept words in the usual sense cannot be completely equated
The above aspects have brought certain difficulties to the concept word extraction task in video subtitles, so it is necessary to improve the existing keyword extraction methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting conceptual words from video subtitles

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Noun Explanation: Concept Words

[0044] Concept words are words or phrases that express knowledge points in course learning.

[0045] Formally, the concept word c can be represented as a k-gram in the course corpus and satisfy the following two characteristics: a) the concept word c should be a phrase with correct semantics and syntax; b) the concept word c should represent a scientific or technical knowledge.

[0046] Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:

[0047] Such as figure 1 As shown, a method for extracting concept words from video subtitles includes the following steps:

[0048] s1. Word segmentation is performed on the subtitle text, and punctuation marks are deleted.

[0049] In this embodiment, the open-source NLTK word segmentation package is selected to perform word segmentation processing on the subtitle text and delete punctuation marks.

[0050] s2. Process the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting conceptual words from video subtitles, which comprises the following steps of: carrying out word segmentation processing on a subtitle text, and deleting punctuation marks; stop words and part-of-speech tagging are carried out on the caption text after word segmentation; calculating co-occurrence characteristics of the target word and the adjacent word; calculating the semantic similarity between the target word and the adjacent word; performing concept word marking on a small number of subtitle texts subjected to word segmentation to serve as atraining set; and training a pre-established semi-supervised learning framework based on a conditional random field according to the training set to obtain a conceptual word prediction model, and obtaining a conceptual word prediction result corresponding to the subtitle text output by the conceptual word prediction model. Based on the method for extracting the conceptual words provided by the invention, the workload of manually labeling corpora is reduced, the accuracy of extracting the conceptual words in the MOOC video subtitle scene is improved, and the actual requirements are met.

Description

technical field [0001] The invention relates to a method for extracting concept words, in particular to a method for extracting concept words from video subtitles. Background technique [0002] Massive Open Online Courses (Massive Open Online Courses, referred to as MOOCs) have promoted knowledge sharing around the world due to their high-quality course resources, creating a large number of opportunities for teaching and learning in different disciplines. One of the basic steps in mining and analyzing MOOC platform data is to extract concept words in video subtitles. [0003] At present, although there are many methods to extract concept words from text, such as supervised and unsupervised methods based on various machine learning algorithms such as support vector machines and neural networks, supervised methods require a large amount of artificially labeled corpus, while unsupervised methods There is no need to manually annotate the corpus, but it is difficult to achieve s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/483G06F17/27
CPCG06F16/483G06F40/284G06F40/30
Inventor 赵中英杨永浩周慧李超
Owner SHANDONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products