Method, device, equipment and storage medium for determining course video

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By segmenting the initial course videos into knowledge point granularities and splicing them together with knowledge point video sets matched to target students, the problem of low matching accuracy in longer course videos is solved, which improves students' learning interest and learning outcomes and enhances the user experience of the education platform.

CN117253386BActive Publication Date: 2026-06-23TENCENT TECHNOLOGY (SHENZHEN) CO LTD

View PDF 4 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date: 2022-10-20
Publication Date: 2026-06-23

Application Information

Patent Timeline

20 Oct 2022

Application

23 Jun 2026

Publication

CN117253386B

IPC: G09B5/06

AI Tagging

Application Domain

Electrical appliances

Technology Topics

Engineering Knowledge management

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Texitile light ageing test instrument
CN1588059Acompact structure Easy to assemble and disassemble Material analysis by optical meansTextile testingEngineeringLight filter
Multi-dimensional training method and device of support vector machine
CN114186620AImprove linear separabilityimprove classificationKernel methods Character and pattern recognition Data setDescent algorithm
Loop structure of cold heat flows
CN1916533AImprove efficiencySimple configurationFluid circulation arrangementHeating and refrigeration combinationsHeat flowWorking fluid
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories EngineeringSediment
Credit text analysis method, credit object auditing method and credit object auditing device
CN114386430AReduce labor costs Improve efficiency Finance Semantic analysisCredit cardEngineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, some of the longer course videos recorded by teachers have a low degree of relevance to students, which reduces students' learning interest and effectiveness, and consequently lowers the interaction rate with the education platform.

Method used

By segmenting the initial course videos into knowledge point granularities, candidate knowledge point videos are obtained. Combining the learning needs and attribute characteristics of the target students, a set of matching target knowledge point videos is determined and spliced together to select target course videos that meet the conditions.

Benefits of technology

This improved the matching degree of course videos to students, enhanced their learning interest and effectiveness, and increased the interaction rate between students and the education platform.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117253386B_ABST

Patent Text Reader

Abstract

The application discloses a method, device and equipment for determining a course video and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: dividing an initial course video recorded by a teacher according to knowledge point granularity to obtain a candidate knowledge point video; acquiring a knowledge point learning requirement and attribute characteristics of a target student; determining at least one target knowledge point matched with the target student based on the knowledge point learning requirement of the target student; determining a target knowledge point video set matched with the target student from the candidate knowledge point video based on the at least one target knowledge point and the attribute characteristics of the target student; splicing the target knowledge point videos in the target knowledge point video set to obtain at least one candidate course video; and selecting a target course video meeting a selection condition from the at least one candidate course video. The target course video determined in this way is beneficial to improving the learning interest and learning effect of students and improving the interaction rate of students and an education platform.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, device, and storage medium for determining course videos. Background Technology

[0002] With the development of computer technology, more and more educational platforms are offering online course videos, allowing students to expand their knowledge by learning from these videos. These platforms can provide students with a selection of course videos to choose from.

[0003] In related technologies, one or more course videos are directly selected from multiple course videos recorded by teachers as the course videos for students to learn. Teacher-recorded course videos are typically long videos, often lasting an hour. These long videos may contain parts that are not highly relevant to students. Directly using teacher-recorded course videos as the course videos for students can easily reduce students' learning interest and effectiveness, thereby decreasing the interaction rate between students and the educational platform. Summary of the Invention

[0004] This application provides a method, apparatus, device, and storage medium for determining course videos, which can be used to improve students' learning interest and learning outcomes. The technical solution is as follows:

[0005] On one hand, embodiments of this application provide a method for determining course videos, the method comprising:

[0006] Obtain the initial course video recorded by the teacher, and segment the initial course video according to the knowledge point granularity to obtain candidate knowledge point videos;

[0007] The knowledge point learning needs and attribute characteristics of the target student are obtained. The knowledge point learning needs are used to indicate the knowledge points that the target student expects to learn, and the attribute characteristics are used to identify the target student.

[0008] Based on the learning needs of the target students, at least one target knowledge point is identified that matches the target students; based on the at least one target knowledge point and the attribute characteristics of the target students, a set of target knowledge point videos matching the target students is identified from the candidate knowledge point videos, and the set of target knowledge point videos includes at least one target knowledge point video corresponding to each target knowledge point;

[0009] The target knowledge point videos in the target knowledge point video set are spliced together to obtain at least one candidate course video. Each candidate course video is obtained by splicing together one target knowledge point video corresponding to each target knowledge point.

[0010] Select a target course video that meets the selection criteria from the at least one candidate course video, wherein the target course video is the course video that the target student is to learn.

[0011] On the other hand, an apparatus for determining course videos is provided, the apparatus comprising:

[0012] The first acquisition module is used to acquire the initial course video recorded by the teacher, and to segment the initial course video according to the knowledge point granularity to obtain candidate knowledge point videos;

[0013] The second acquisition module is used to acquire the knowledge point learning needs and attribute characteristics of the target student. The knowledge point learning needs are used to indicate the knowledge points that the target student expects to learn, and the attribute characteristics are used to identify the target student.

[0014] The determination module is used to determine at least one target knowledge point that matches the target student based on the target student's knowledge point learning needs; and to determine a set of target knowledge point videos that matches the target student from the candidate knowledge point videos based on the at least one target knowledge point and the attribute characteristics of the target student, wherein the set of target knowledge point videos includes at least one target knowledge point video corresponding to each target knowledge point.

[0015] The splicing module is used to splice the target knowledge point videos in the target knowledge point video set to obtain at least one candidate course video. Each candidate course video is obtained by splicing one target knowledge point video corresponding to each target knowledge point.

[0016] The selection module is used to select a target course video that meets the selection criteria from the at least one candidate course video, wherein the target course video is the course video to be studied by the target student.

[0017] In one possible implementation, the determining module is configured to: determine at least one first knowledge point video corresponding to any target knowledge point from the candidate knowledge point videos; determine at least one second knowledge point video that meets the filtering conditions from the at least one first knowledge point video corresponding to any target knowledge point; determine the matching degree between the at least one second knowledge point video and the target student based on the attribute characteristics of the target student and the video characteristics of the at least one second knowledge point video; designate the at least one second knowledge point video whose matching degree meets the matching conditions as at least one target knowledge point video corresponding to any target knowledge point; and determine the target knowledge point video set based on the at least one target knowledge point video corresponding to each target knowledge point.

[0018] In one possible implementation, the selection module is configured to acquire at least one of a first evaluation indicator, a second evaluation indicator, and a third evaluation indicator for any candidate course video; determine the evaluation result of any candidate course video based on at least one of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator; and select candidate course videos whose evaluation results satisfy the first evaluation condition as the target course videos that satisfy the selection condition; wherein, the first evaluation indicator is obtained based on the text corresponding to each target knowledge point video in any candidate course video, and the first evaluation indicator is used to evaluate the rationality of the content summary of any candidate course video; the second evaluation indicator is obtained based on each video frame in any candidate course video, and the second evaluation indicator is used to evaluate the viewing smoothness of any candidate course video; the third evaluation indicator is obtained based on the text corresponding to each target knowledge point video in any candidate course video and each video frame in any candidate course video, and the third evaluation indicator is used to evaluate the overall quality of any candidate course video.

[0019] In one possible implementation, the determining module is used to acquire an educational knowledge graph, which includes nodes corresponding to knowledge points and edges between nodes corresponding to knowledge points; and to determine at least one target knowledge point that matches the target student based on the target student's knowledge point learning needs and the educational knowledge graph.

[0020] In one possible implementation, the determining module is configured to: obtain an initial knowledge graph; remove nodes and edges in the initial knowledge graph whose relevance to the educational scenario does not meet the relevance requirements, to obtain a first knowledge graph; based on at least one of the books corresponding to the educational scenario and the multimedia resources corresponding to the educational scenario, add nodes and edges in the first knowledge graph whose relevance to the educational scenario meets the relevance requirements and which are missing in the first knowledge graph, to obtain a second knowledge graph; and modify the second knowledge graph to obtain the educational knowledge graph.

[0021] In one possible implementation, the splicing module is used to determine the arrangement order of the at least one target knowledge point; and to splice the target knowledge point videos in the target knowledge point video set based on the arrangement order to obtain at least one candidate course video, wherein any candidate course video is obtained by splicing a target knowledge point video corresponding to each target knowledge point according to the arrangement order.

[0022] In one possible implementation, the first acquisition module is configured to acquire audio data corresponding to the initial course video, convert the audio data into text, extract knowledge points from the text, determine the classification result of the video frame based on the video frame of the initial course video and the text corresponding to the video frame, the classification result of the video frame being used to indicate the knowledge point corresponding to the video frame in the extracted knowledge points, segment the initial course video according to the knowledge point granularity based on the classification result of each video frame in the initial course video to obtain segmented videos, and acquire the candidate knowledge point videos based on the segmented videos.

[0023] In one possible implementation, the first acquisition module is used to acquire the evaluation result of the segmented video based on the video features of the segmented video; and to remove videos in the segmented video whose evaluation results do not meet the second evaluation condition to obtain the candidate knowledge point videos.

[0024] In one possible implementation, the video features of the segmented video include at least one of the following: the features of the teacher who recorded the initial course video corresponding to the segmented video, the features of the institution to which the teacher who recorded the initial course video corresponding to the segmented video belongs, and the features of the content of the segmented video.

[0025] In one possible implementation, the device further includes:

[0026] The editing module is used to edit the original playback speed of each target knowledge point video in the target course video based on the attribute characteristics of the target student, so as to obtain the target playback speed of each target knowledge point video in the target course video.

[0027] The playback module is used to respond to the learning instruction of the target student for the target course video and play the target knowledge point videos in the target course video according to the target playback speed of each target knowledge point video.

[0028] In one possible implementation, the device further includes:

[0029] An update module is used to obtain feedback information from the target student regarding the target course video, the feedback information being used to indicate the effectiveness of the target course video; and based on the feedback information, to update the course video to be studied by the target student.

[0030] On the other hand, a computer device is provided, the computer device including a processor and a memory, the memory storing at least one computer program, the at least one computer program being loaded and executed by the processor to enable the computer device to implement any of the methods for determining course videos described above.

[0031] On the other hand, a computer-readable storage medium is also provided, wherein at least one computer program is stored therein, the at least one computer program being loaded and executed by a processor to enable a computer to implement any of the methods for determining course videos described above.

[0032] On the other hand, a computer program product is also provided, the computer program product including a computer program or computer instructions, the computer program or computer instructions being loaded and executed by a processor to enable a computer to implement any of the methods for determining course videos described above.

[0033] The technical solution provided in this application has at least the following beneficial effects:

[0034] The technical solution provided in this application determines the target course videos to be learned by the target students based on their knowledge point learning needs and attribute characteristics. Each part of the target course videos is a target knowledge point video that matches the target students from the candidate knowledge point videos. Based on this method, each part of the target course videos has a high degree of matching with the students. Using the target course videos as the course videos to be learned by the students is conducive to improving students' learning interest and learning effect, thereby increasing the interaction rate between students and the education platform. Attached Figure Description

[0035] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0036] Figure 1 This is a schematic diagram illustrating the implementation environment of a method for determining course videos provided in an embodiment of this application;

[0037] Figure 2 This is a flowchart illustrating a method for determining course videos provided in an embodiment of this application;

[0038] Figure 3 This is a schematic diagram illustrating a process for obtaining candidate knowledge point videos according to an embodiment of this application;

[0039] Figure 4 This is a schematic diagram illustrating the visualization result of a knowledge graph provided in an embodiment of this application;

[0040] Figure 5 This is a schematic diagram illustrating a process for obtaining a measurement index for a first knowledge point video based on a dual-tower model, as provided in an embodiment of this application.

[0041] Figure 6 This is a schematic diagram illustrating a process for obtaining matching degree based on the DeepFM model, as provided in an embodiment of this application.

[0042] Figure 7 This is a schematic diagram illustrating a process for determining a target course video according to an embodiment of this application;

[0043] Figure 8 This is a schematic diagram of a framework for determining target course videos provided in an embodiment of this application;

[0044] Figure 9 This is a schematic diagram of a device for determining course videos provided in an embodiment of this application;

[0045] Figure 10 This is a schematic diagram of the structure of a server provided in an embodiment of this application;

[0046] Figure 11 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application. Detailed Implementation

[0047] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0048] For example, the method for determining course videos provided in this application embodiment can be applied to cloud education scenarios. Cloud Computing Education (CCEDU) refers to an education platform service based on a cloud computing business model. On the cloud platform, all educational institutions, training institutions, enrollment service agencies, publicity agencies, industry associations, management agencies, industry media, legal structures, etc., are centrally integrated into a resource pool. These resources can be displayed and interacted with each other, communicate on demand, and reach agreements, thereby reducing education costs and improving efficiency.

[0049] This application provides a method for determining course videos. Please refer to... Figure 1 The diagram illustrates the implementation environment of the method for determining course videos provided in this application embodiment. This implementation environment may include: computer device 11.

[0050] The computer device 11 can be a computer device corresponding to the education platform. This computer device 11 can store the initial course videos recorded by the teacher. When it is necessary to determine the target course videos to be learned for the target students, the method provided in this application embodiment can be applied based on the initial course videos recorded by the teacher. For example, the target students can log in to the education platform, view the initial videos recorded by the teacher and the target course videos determined by the computer device 11 on the platform's display page, and select course videos of interest to learn.

[0051] Optionally, the computer device 11 can be a terminal or a server, and this application embodiment does not limit this. The terminal can be any electronic product capable of human-computer interaction with a user through one or more methods such as a keyboard, touchpad, touchscreen, remote control, voice interaction, or handwriting device, such as a PC (Personal Computer, e.g., laptop, desktop computer), mobile phone, smartphone, PDA (Personal Digital Assistant), wearable device, PPC (Pocket PC), tablet computer, smart car system, smart TV, smart speaker, smartwatch, in-vehicle terminal, etc., but is not limited thereto. The server can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services. The terminal and server can be directly or indirectly connected through wired or wireless communication, and this application does not impose any restrictions.

[0052] Those skilled in the art should understand that the computer device 11 described above is merely an example, and other existing or future computer devices that are applicable to this application should also be included within the scope of protection of this application, and are hereby incorporated by reference.

[0053] This application provides a method for determining course videos, which can be applied to the above-mentioned methods. Figure 1 The implementation environment is shown. Taking the application of this method to computer device 11 as an example. Figure 2 As shown, the method for determining course videos provided in this application embodiment may include the following steps 201 to 205.

[0054] In step 201, the initial course video recorded by the teacher is obtained, and the initial course video is segmented according to the knowledge point granularity to obtain candidate knowledge point videos.

[0055] The initial course video refers to the original video recorded by the teacher for students to learn from. The initial course video can be recorded by the teacher through other platforms and uploaded to the educational platform, or it can be recorded directly by the teacher on the educational platform. The computer devices corresponding to the educational platform can access the initial course video recorded by the teacher. For example, there may be one or more initial course videos, each of which is a relatively long course video composed of multiple fine-grained knowledge point videos. The content of the initial course video recorded by the teacher (e.g., the knowledge points involved, the explanation method for each knowledge point, and the explanation duration) can be set by the teacher themselves, and this application embodiment does not limit this.

[0056] For example, the number of initial course videos recorded by the teacher obtained in step 201 is multiple. This can include multiple initial course videos recorded by the same teacher or initial course videos recorded by multiple teachers respectively, thereby providing rich course video resources for the subsequent course video process and ensuring the reliability of the determined course videos.

[0057] For example, the initial course videos recorded by teachers obtained in step 201 can refer to all initial course videos recorded by teachers existing on the education platform, or they can refer to initial course videos recorded by teachers on the education platform whose recording time is within the reference time period. This application embodiment does not limit this. The reference time period is set based on experience or can be flexibly adjusted according to the application scenario. For example, the reference time period can refer to the time period from a historical timestamp one month away from the current timestamp to the current timestamp, or it can refer to the time period from a historical timestamp seven days away from the current timestamp to the current timestamp.

[0058] On online education platforms, teachers from numerous educational institutions can provide online classes to a vast number of students with internet access through pre-recorded or live streams. However, the diversity of course videos is far less than the diversity of student needs. The core function of these platforms is to match student learning needs with the course videos provided by teachers. The more refined the matching, the better the user experience. For example, an educational platform might offer a "[Java Introduction]" technical series, with multiple sub-lesson-level video lessons (e.g., environment configuration, basic characters, operators, classes, etc.). Each sub-lesson-level video is a relatively long lesson, often lasting an hour, composed of multiple videos covering fine-grained knowledge points. Furthermore, multiple teachers or institutions may be teaching the "[Java Introduction]" series on the platform. Here, "Java" refers to a computer language. These sub-lesson-level video lessons constitute the initial course videos in this embodiment. Determining suitable learning videos for students based on these initial lessons is crucial for improving student learning interest and user experience.

[0059] In this embodiment, since the initial course video is a relatively long video consisting of multiple fine-grained knowledge point videos, after obtaining the initial course video, it is segmented according to the knowledge point granularity to obtain candidate knowledge point videos. These candidate knowledge point videos are the materials directly used to determine the course videos. Compared to the initial course video, the candidate knowledge point videos have a finer granularity. Using these candidate knowledge point videos as the basis for determining the course videos allows for the identification of course videos that match students' knowledge point learning needs, improving the reliability of the determined course videos and thus increasing students' interest and learning effectiveness.

[0060] It should be noted that there may be one or more initial course videos. This application embodiment takes one initial course video as an example to introduce the implementation process of segmenting the initial course video according to the granularity of knowledge points. For the case where there are multiple initial course videos, the process of segmenting according to the granularity of knowledge points is implemented for each initial course video according to the same principle.

[0061] In one possible implementation, the process of segmenting the initial course video according to the granularity of knowledge points to obtain candidate knowledge point videos includes the following steps 2011 to 2014.

[0062] Step 2011: Obtain the audio data corresponding to the initial course video and convert the audio data into text.

[0063] The audio data corresponding to the initial course video refers to the audio recording of the teacher explaining the content of the initial course video during its recording process. After acquiring the initial course video, the audio data can be extracted. After obtaining the audio data corresponding to the initial course video, the audio data is converted into text to facilitate the extraction of knowledge points related to the initial course video.

[0064] This application does not limit the implementation method of converting speech data into text. Exemplarily, a speech recognition model is used to convert speech data into text. This application does not limit the type of speech recognition model, as long as it can convert the input speech data into text and output it. Exemplarily, the speech recognition model can be an out-of-the-box model provided by various technology platforms. For example, the speech recognition model can refer to a seq2seq (Sequence to Sequence) model. Exemplarily, the speech recognition model can be trained end-to-end using sample speech data and corresponding text labels, where end-to-end is a supervised training method.

[0065] Step 2012: Extract knowledge points from the text.

[0066] A knowledge point is the smallest unit of knowledge. By extracting knowledge points from text, the knowledge points involved in the initial course videos can be determined, facilitating the division of the initial course videos according to the granularity of knowledge points. In an exemplary embodiment, the extraction of knowledge points from text is implemented by using a NER (Named Entity Recognition) model. The NER model aims to identify entities of interest in text. In this embodiment, knowledge points are taken as entities of interest, thereby utilizing the NER model to extract knowledge points from the text.

[0067] This application does not limit the type of NER model, as long as it can extract knowledge points from the input text and output them. For example, the NER model can refer to the BERT_CRF (Bidirectional Encoder Representations from Transformer_Conditional Random Fields) model, or the Lattice LSTM (Long Short-Term Memory) model with a grid structure.

[0068] For example, the NER model can be trained using an end-to-end supervised training method based on sample text and its corresponding knowledge point labels. The sample text and its corresponding knowledge point labels constitute the training data for the NER model. This training data can be obtained by matching a corpus with a summarized knowledge point dictionary and then manually checking and proofreading it. The amount of training data can be flexibly set according to the actual application scenario; training data can also be referred to as supervised data.

[0069] For example, after extracting knowledge points from the text, the extracted knowledge points can be arranged chronologically as a feature of the initial course video. For example, chronologically arranging the extracted knowledge points can mean arranging the extracted knowledge points according to the order in which they appear in the text.

[0070] Step 2013: Based on the video frames and corresponding text of the initial course video, determine the classification result of the video frames. The classification result of the video frames is used to indicate the knowledge points corresponding to the video frames in the extracted knowledge points.

[0071] The initial course video consists of multiple sequentially arranged video frames. The frequency used to divide the initial course video into video frames can be set empirically, for example, 24 frames per second. It should be noted that since the initial course video is composed of multiple sequentially arranged video frames, each video frame corresponds to timing information. Based on the timing information corresponding to different video frames, the chronological order of different video frames can be determined. For example, the timing information corresponding to any video frame could refer to the playback time of that video frame; or it could refer to the sequence number of that video frame, which is a number obtained by sequentially numbering starting from the first video frame.

[0072] For example, the text corresponding to any video frame may include the text corresponding to the teacher's speech explaining that video frame, which can be extracted from the text converted in step 2011. For example, the text corresponding to any video frame may include the text contained in the image content of that video frame, which can be obtained by performing text recognition (e.g., OCR (Optical Character Recognition)) on that video frame.

[0073] Each video frame refers to information in the image modality, and the text corresponding to each video frame refers to information in the text modality. By comprehensively considering both the video frame and the text corresponding to the video frame, the classification result of the video frame can be determined. This enables the acquisition of classification results by considering multimodal information, which helps to improve the reliability of the classification results.

[0074] The initial course video contains multiple video frames. Determining the classification result of a video frame based on these frames and their corresponding text means classifying each video frame individually, according to its text. The principle for determining the classification result of each video frame is the same; the implementation method for determining the classification result of any given video frame will be explained using an example.

[0075] In an exemplary embodiment, determining the classification result of any video frame based on any video frame and the corresponding text includes: extracting video features of any video frame; extracting text features of the text corresponding to any video frame; and determining the classification result of any video frame based on the video features and text features. For example, a video feature extraction model is invoked to extract video features of any video frame. This video feature model refers to a model used to extract features from video frames; for example, this video feature model can also be called a Vision Transformer model. For example, a text feature extraction model is invoked to extract text features of the text corresponding to any video frame. This text feature extraction model refers to a model used to extract features from text; for example, the text feature extraction model can be a language model, such as the BERT model.

[0076] For example, the process of determining the classification result of any video frame based on video features and text features includes: fusing the video features and text features to obtain fused features; and classifying the fused features to obtain the classification result of any video frame. For example, this process can be implemented by calling a feature processing model. For instance, the feature processing model can refer to an MLP (Multilayer Perceptron).

[0077] For example, a video feature extraction model, a text feature extraction model, and a feature processing model can collectively constitute a classification model, which can be considered a type of image-text multimodal model. The classification model can be obtained under supervision based on training data, which includes sample video frames, the corresponding text for each sample video frame, and the corresponding category labels for each sample video frame. The category labels for each sample video frame are obtained through manual annotation.

[0078] The classification result of any video frame is used to indicate the knowledge point corresponding to that video frame among the extracted knowledge points. This application embodiment does not limit the representation of the classification result of any video frame. For example, the classification result of any video frame includes the probability corresponding to each candidate knowledge point category. The candidate knowledge point categories are pre-set knowledge point categories, and different knowledge point categories are used to identify different knowledge points. The candidate knowledge point categories include categories used to identify the knowledge points extracted from the text in step 2012. Based on the probabilities corresponding to each candidate knowledge point category, the probability corresponding to the category of the knowledge point extracted from the text can be determined. Then, the knowledge point corresponding to the probability that satisfies the probability condition among the probabilities corresponding to the category of the knowledge point extracted from the text can be used as the knowledge point corresponding to that video frame among the extracted knowledge points. For example, the probability that satisfies the probability condition can refer to the maximum probability, or it can refer to a probability not less than a probability threshold, etc. This application embodiment does not limit this. The knowledge point corresponding to any video frame may be one or multiple, depending on the classification result of any video frame and the setting of the probability conditions.

[0079] For example, the above description illustrates the process of obtaining the classification result of any video frame by using both any video frame and the text corresponding to any video frame as input information for the classification model. This application is not limited to this. In some embodiments, the classification result of any video frame can also be obtained by using only any video frame as input information for the classification model. In this case, the classification model may only include a video feature extraction model and a feature processing model. In some embodiments, the classification result of any video frame can also be obtained by using only the text corresponding to any video frame as input information for the classification model. In this case, the classification model may only include a text feature extraction model and a feature processing model. In some embodiments, the classification result of any video frame can also be obtained by using any video frame, the text corresponding to any video frame, and information from other modalities corresponding to any video frame (e.g., information from a speech modality) as input information for the classification model. In this case, in addition to the video feature extraction model, the text feature extraction model, and the feature processing model, the classification model may also include a feature extraction model for extracting features from other models.

[0080] Step 2014: Based on the classification results of each video frame in the initial course video, the initial course video is segmented according to the knowledge point granularity to obtain segmented videos; based on the segmented videos, candidate knowledge point videos are obtained.

[0081] Based on the classification result of any video frame, the knowledge points corresponding to each video frame in the extracted knowledge points can be determined. Then, starting from the first video frame in the initial course video, the system iterates sequentially according to the knowledge points corresponding to each video frame. During the traversal, consecutive video frames with the same corresponding knowledge points are treated as a segmented video. After traversing all video frames, the resulting segmented videos are the segmented videos obtained by segmenting the initial course video according to the granularity of knowledge points. It should be noted that different segmented videos may consist of different numbers of video frames, which is related to the actual classification result of the video frames. This application embodiment does not limit this. It should also be noted that the number of knowledge points corresponding to any video frame may be one or more. The same corresponding knowledge points can mean that any one of the corresponding knowledge points is the same, or it can mean that all the corresponding knowledge points are the same. This application embodiment does not limit this.

[0082] For example, after acquiring the segmented video, each segmented video can be assigned a knowledge point tag based on the knowledge points corresponding to the video frames in each segmented video. This allows the knowledge point tags to intuitively describe the knowledge points involved in the segmented video. For example, any segmented video may involve one or more knowledge points. This application does not limit the form of the knowledge point tags, as long as they can describe the knowledge points. For example, knowledge point tags may include the name of the knowledge point, its number in the teaching syllabus, etc.

[0083] For example, taking the classification result of any video frame as the input information of the classification model and using any video frame and the text corresponding to any video frame as the example, the segmentation of the initial course video based on the classification result can refer to the segmentation of the initial course video based on multimodal information, which has high segmentation reliability.

[0084] In one possible implementation, obtaining candidate knowledge point videos based on segmented videos can mean using segmented videos as candidate knowledge point videos. This method is relatively efficient in obtaining candidate knowledge point videos.

[0085] In another possible implementation, the process of obtaining candidate knowledge point videos based on video segmentation includes selecting videos that meet reliability criteria from the segmented videos as candidate knowledge point videos. This method can guarantee the reliability of the obtained candidate knowledge point videos.

[0086] For example, a segmented video containing no less than a first threshold number of video frames can be considered as a video that meets the reliability criteria. The first threshold is set empirically, for example, the first threshold is 2 or the first threshold is 3.

[0087] For example, the process of selecting videos that meet the reliability criteria from the segmented videos includes: obtaining the evaluation results of the segmented videos based on the video features of the segmented videos; and removing videos whose evaluation results do not meet the second evaluation criteria to obtain candidate knowledge point videos. The video features of the segmented videos are used to describe the characteristics of the segmented videos. In an exemplary embodiment, the video features of the segmented videos include at least one of the following: the characteristics of the recording teacher of the initial course video corresponding to the segmented video, the characteristics of the institution to which the recording teacher of the initial course video corresponding to the segmented video belongs, and the characteristics of the content of the segmented video.

[0088] The characteristics of the recording teachers of the initial course videos corresponding to the segmented videos can be extracted from their historical recorded course videos, student evaluations, and teacher introduction information. For example, these characteristics include, but are not limited to, the teacher's gender, age, speaking speed, Mandarin proficiency, approachability, positive review rate, understanding of the knowledge points, teaching style, and voice timbre. For example, gender, age, and speaking speed are basic features; Mandarin proficiency, approachability, positive review rate, and understanding of the knowledge points are business rule features; and teaching style and voice timbre are deep learning vector representation features.

[0089] The characteristics of the institution to which the teacher who recorded the initial course video corresponding to the segmented video belongs can be obtained by extracting evaluation information, promotional information, etc. of the institution to which the teacher who recorded the initial course video corresponding to the segmented video belongs. For example, the characteristics of the institution to which the teacher who recorded the initial course video corresponding to the segmented video belongs include the institution level (e.g., whether the institution is a gold-level institution, which is an institution level), the number of teachers in the institution, and the institution's positive review rate.

[0090] The characteristics of the teachers who recorded the initial course videos corresponding to the segmented videos, and the characteristics of the institutions to which the teachers belong, can be maintained using the platform's creator profile (institution or individual) feature maintenance module. This module can bind the characteristics of teachers and institutions to the initial course videos. Since student learning preferences take these characteristics into account, they need to be considered when determining personalized course videos for students.

[0091] The characteristics of the recording teacher of the initial course video corresponding to the segmented video and the characteristics of the institution to which the recording teacher of the initial course video corresponding to the segmented video belongs can be regarded as features that indirectly describe the segmented video. The characteristics of the content of the segmented video can be regarded as features that directly describe the segmented video, which can intuitively assist the algorithm for personalized course video customization.

[0092] The types of features for segmented video content can be flexibly set according to needs. For example, based on user needs analysis, certain business rules can be specified, and features of the segmented video content can be extracted according to these rules. For instance, the features of the segmented video content may include the difficulty of the knowledge points involved, the teacher's speaking speed, the level of detail in the explanation, and the number of video frames included in the segmented video. For example, some features can be obtained directly, such as the number of video frames included in the segmented video; some features can be processed through rules, such as the teacher's speaking speed and the level of detail in the explanation; and some features can be obtained through model processing, such as the difficulty of the knowledge points involved in the segmented video content, which can be classified using a video processing model (such as an LSTM model or a BERT model), and then the difficulty of the knowledge points involved in the segmented video content can be determined based on the classification results. The video processing model is trained using supervised training data, which includes sample videos and difficulty labels for the knowledge points involved in the sample videos. The difficulty labels for the knowledge points involved in the sample videos need to be manually summarized using rules.

[0093] Taking the video features of the segmented video, including the characteristics of the teacher who recorded the original course video, the characteristics of the institution to which the teacher belongs, and the characteristics of the content of the segmented video, as an example, by comprehensively considering multiple features, we can obtain rich multi-dimensional features of the segmented video, thus providing a more comprehensive description of the segmented video. In some embodiments, since the segmented video is obtained by cutting from the original course video, the duration of the segmented video is shorter than that of the original course video. Therefore, the original course video can also be called a long video, and the segmented video can also be called a short video.

[0094] In one possible implementation, the method for obtaining the evaluation result of the segmented video based on its video features includes: calling a short video evaluation model to perform a quality assessment on the video features of the segmented video, thereby obtaining the evaluation result of the segmented video. The evaluation result of the segmented video is used to indicate the reliability of the segmented video. This application embodiment does not limit the representation of the evaluation result of the segmented video. For example, the evaluation result of the segmented video can be represented as a score, which is positively correlated with reliability; another example is that the evaluation result of the segmented video can also be represented as a quality level, which may include excellent, good, poor, etc.

[0095] For example, using a short video evaluation model to assess the quality of video features in a segmented video can mean inputting only the video features of the segmented video into the short video evaluation model, or it can mean inputting both the video features and auxiliary features of the segmented video into the short video evaluation model together. For example, auxiliary features can be manually summarized features, such as student viewing time from feedback information already collected on the segmented video; for example, auxiliary features can also be embedded features of each video frame in the segmented video (e.g., features extracted using a video feature extraction model), etc.

[0096] The model structure of a short video evaluation model can be flexibly set according to experience or application needs. The model can be trained using supervised training data, which can be constructed from data accumulated through student learning feedback. For example, a short video evaluation model can be an XGB (Extreme Gradient Boosting) model or a DeepFM (Deep Factorization Machine) model, etc.

[0097] After obtaining the evaluation results of the segmented videos, videos whose evaluation results do not meet the second evaluation condition are removed, and the retained segmented videos are used as candidate knowledge point videos. The criteria for an evaluation result not meeting the second evaluation condition are set based on experience or flexibly adjusted according to the representation of the evaluation results; this embodiment does not limit this. For example, if the evaluation result of the segmented video is represented as a score positively correlated with reliability, then the evaluation result not meeting the second evaluation condition may mean that the evaluation result is less than a score threshold, which is set based on experience. For example, if the evaluation result of the segmented video is represented as a quality level, including excellent, good, and poor, then the evaluation result not meeting the second evaluation condition may mean that the evaluation result is poor.

[0098] Since candidate knowledge point videos are selected from segmented videos, and these segmented videos have knowledge point tags, the candidate knowledge point videos also have knowledge point tags. Based on the knowledge point tags of the candidate knowledge point videos, the corresponding knowledge point can be determined. It should be noted that one candidate video may correspond to one or more knowledge points. In an exemplary embodiment, because different creators may have locally similar competing courses, the relationship between knowledge points and candidate knowledge point videos may be one-to-many. After obtaining the candidate knowledge point videos, they can be categorized and stored according to knowledge points; that is, candidate knowledge point videos corresponding to a specific knowledge point are stored as associated with that knowledge point.

[0099] For example, the process of obtaining candidate knowledge point videos can be as follows: Figure 3As shown. The initial course videos recorded by the teachers are obtained. For example, Teacher 1's initial course videos are from Series A, including the first A1 introductory course video and the second A2 advanced course video; Teacher 2's initial course videos are from Series A, including the first A1 introductory course video and the second A2 advanced course video; Teacher 3's initial course videos are from Series B, including the first B1 introductory course video and the second B2 advanced course video; Teacher 4's initial course videos are from Series C, including the first C1 introductory course video and the second C2 advanced course video. Through long video processing, the initial course videos are segmented at the knowledge point level to obtain candidate knowledge point videos. The segmentation process involves speech-to-text conversion, NER extraction of knowledge points, multimodal segmentation into segmented videos based on knowledge points, segmented video feature extraction, fusion of teacher features and segmented video features, segmented video quality evaluation, and other processing logic. The segmentation process can be viewed as using AI (Artificial Intelligence) deep learning models to preprocess sub-courses within a course series at the knowledge point level. After acquiring candidate knowledge point videos, these videos are stored in a database to provide material for the subsequent determination of course videos. For example, the database includes short video libraries for course series A (including candidate knowledge point videos such as a11 and a12), short video libraries for course series B (including candidate knowledge point videos such as b11 and b12), short video libraries for course series C (including candidate knowledge point videos such as c11 and c12), and other skill or learning need knowledge bases.

[0100] In step 202, the learning needs and attribute characteristics of the target student are obtained. The learning needs are used to indicate the knowledge points that the target student expects to learn, and the attribute characteristics are used to identify the target student.

[0101] The target student refers to any student who needs to learn knowledge on the education platform. For example, the target student may be a student who has registered and logged in on the education platform. The target student may have already studied the course videos on the education platform or may not have studied the course videos on the education platform. This application embodiment does not limit this.

[0102] The target student's learning needs are used to indicate the knowledge points the target student expects to learn. These needs can be obtained by the target student themselves, through analysis of previously studied course videos, or by analyzing other information provided by the student (e.g., knowledge point mastery, grade level). Of course, the target student's learning needs may also be obtained through other methods, which are not limited in this embodiment. It should be noted that different students may have the same or different learning needs.

[0103] The attribute characteristics of the target student are used to identify the target student and represent their personalized information. The types of attribute characteristics can be set based on experience or flexibly adjusted according to the actual application scenario. In an exemplary embodiment, the attribute characteristics of the target student may include low-frequency basic information characteristics and high-frequency dynamic characteristics. Low-frequency basic information characteristics are those that change little over time, such as the target student's age, gender, interests, and tag preferences. High-frequency dynamic characteristics are those that change significantly over time, such as the target student's historical search requests, enrolled courses, browsing history, knowledge point mastery, and assessment results. The attribute characteristics of the target student can be obtained by the target student filling them in themselves or by analyzing the target student's behavior on the education platform; this embodiment does not limit this. By considering the attribute characteristics of the target student, the algorithm for determining course videos can provide different personalized course video results for the same knowledge point learning needs.

[0104] In an exemplary embodiment, the target student's knowledge point learning needs and attribute characteristics are obtained only with the target student's full authorization. For example, before obtaining the target student's knowledge point learning needs and attribute characteristics, a prompt pop-up window is displayed, allowing the target student to choose whether to allow the acquisition of these needs and characteristics. Only when the target student selects the "allow" option is the acquisition of their authorization to obtain these needs and characteristics permitted.

[0105] In step 203, based on the learning needs of the target students, at least one target knowledge point matching the target students is determined; based on the at least one target knowledge point and the attribute characteristics of the target students, a set of target knowledge point videos matching the target students is determined from the candidate knowledge point videos, and the set of target knowledge point videos includes at least one target knowledge point video corresponding to each target knowledge point.

[0106] At least one target knowledge point matching the target student refers to a knowledge point suitable for the target student to learn. In an exemplary embodiment, determining at least one target knowledge point matching the target student based on the target student's knowledge point learning needs is achieved by using the knowledge point that the target student expects to learn, as indicated by the target student's knowledge point learning needs, as the target knowledge point.

[0107] In an exemplary embodiment, determining at least one target knowledge point matching the target student based on the target student's learning needs involves: acquiring an educational knowledge graph, which includes nodes corresponding to knowledge points and edges between those nodes; and determining at least one target knowledge point matching the target student based on the target student's learning needs and the educational knowledge graph. In this implementation, the at least one target knowledge point matching the target student includes the knowledge points the target student expects to learn as indicated by the student's learning needs, and knowledge points with a target-related relationship to the expected knowledge points determined by the educational knowledge graph. For example, the knowledge points with a target-related relationship to the expected knowledge points can refer to the knowledge points that need to be stored to learn the expected knowledge points.

[0108] An educational knowledge graph refers to a knowledge graph applicable to educational scenarios. An educational knowledge graph includes nodes corresponding to knowledge points and edges between these nodes. These edges indicate the relationships between knowledge points. In an exemplary embodiment, the process of determining at least one target knowledge point matching a target student based on the student's learning needs and the educational knowledge graph includes: determining the knowledge points the student expects to learn based on their learning needs; identifying the nodes corresponding to the expected knowledge points in the educational knowledge graph; determining the knowledge points corresponding to nodes with edges indicating a target relationship as associated knowledge points; and using the expected knowledge points and their associated knowledge points as at least one target knowledge point matching the student. For example, the target relationship can be set based on experience; for instance, the target relationship can refer to a hierarchical relationship.

[0109] In one possible implementation, the process of acquiring an educational knowledge graph includes the following steps (1) to (3).

[0110] Step (1): Obtain the initial knowledge graph; remove nodes and edges in the initial knowledge graph that do not meet the association requirements with the education scenario to obtain the first knowledge graph.

[0111] The initial knowledge graph can be any knowledge graph, such as an open-source general-purpose knowledge graph. Based on this initial knowledge graph, modifications are made to meet the real-world needs of the educational scenarios involved in the education platform, thus upgrading it into an educational knowledge graph. For example, an educational knowledge graph can also be called an education industry knowledge graph.

[0112] For example, several cutting-edge open-source commercial general-purpose knowledge graphs exist, such as OpenKG, an open knowledge graph community alliance project initiated and advocated by the Language and Knowledge Computing Professional Committee of the Chinese Information Processing Society of China in 2015. In addition, the OwnThink platform (an open platform) has open-sourced a 140 million-word Chinese knowledge graph. A knowledge graph from any source, or a merged knowledge graph of multiple open-source knowledge graphs, can be used as the initial knowledge graph. The main form of the data structure of a knowledge graph is that nodes and edges form directed and undirected relationships. Nodes and edges have various types, and each node and edge can define its own attributes. For example, the visualization result of a knowledge graph can be as follows: Figure 4 As shown, in Figure 4 In this context, there are multiple nodes and edges between them, with text on the edges indicating the relationships between the nodes.

[0113] For example, knowledge graphs can be stored using storage tools. Commonly used open-source knowledge graph data storage tools include Neo4j and Nubula. You can choose suitable initial knowledge graph data and storage technology based on your own conditions, and then edit the initial knowledge graph according to the following steps to obtain an educational knowledge graph.

[0114] After obtaining the initial knowledge graph, nodes and edges whose relevance to the educational scenario does not meet the association requirements are removed, resulting in the first knowledge graph. The lack of a relevance to the educational scenario refers to a weak association. This relevance can be set based on experience or adjusted flexibly according to actual circumstances; this embodiment does not limit this. It should be noted that the educational scenario here can refer to a general educational scenario or an educational scenario related to a specific domain; this embodiment does not limit this either.

[0115] Removing nodes and edges with weak relevance to the educational context can reduce the size of the knowledge graph, decrease performance bottlenecks and hardware resource requirements, and reduce noise in subsequent algorithmic processes. For example, the process of removing nodes and edges with weak relevance to the educational context can involve human experience. For instance, assuming the educational context is about teaching IT (Information Technology), nodes with weak relevance to IT and the edges connecting them can be removed.

[0116] Step (2): Based on at least one of the books corresponding to the educational scenario and the multimedia resources corresponding to the educational scenario, add nodes and edges that meet the association requirements with the educational scenario and are missing in the first knowledge graph to the first knowledge graph to obtain the second knowledge graph.

[0117] The first knowledge graph may lack some nodes and edges that meet the association requirements for educational scenarios. Therefore, after obtaining the first knowledge graph, it is necessary to add the missing nodes and edges that meet the association requirements for educational scenarios to the first knowledge graph to obtain the second knowledge graph. In the process of adding nodes and edges that meet the association requirements for educational scenarios to the first knowledge graph, at least one of the following can be considered: books corresponding to educational scenarios and multimedia resources corresponding to educational scenarios. The multimedia resources corresponding to educational scenarios may include, but are not limited to, videos, news, PowerPoint presentations, teaching outlines, etc., related to educational scenarios.

[0118] For example, taking the addition of nodes and edges that meet the association requirements with educational scenarios and are missing from the first knowledge graph based on books corresponding to educational scenarios as an example, we can extract the knowledge points involved in the main text of the books in the educational scenario pair using information extraction technology. For instance, when it is necessary to summarize a comprehensive knowledge graph of Java-related learning skills, that is, when the educational scenario is teaching Java, the corresponding book could be "Thinking in Java, 4th Edition". We can obtain the book's table of contents, set the knowledge points corresponding to the table of contents as parent nodes, and extract the knowledge points involved in each chapter using the NER model as child nodes under the parent nodes. The relationship between the edges between the child nodes and the parent nodes is defined as a hierarchical relationship. The missing parent nodes, child nodes, and edges between them in the first knowledge graph are then added to the first knowledge graph.

[0119] For example, taking multimedia resources corresponding to an educational scenario as an example, this involves adding nodes and edges to the first knowledge graph that meet the association requirements for the educational scenario and are missing from the first knowledge graph. The multimedia resources corresponding to the educational scenario are course videos. When a teacher uploads a course video corresponding to an educational scenario, a course outline and introduction are required. The audio in the course video can be converted into text. Knowledge points are extracted from the teacher's uploaded course outline and introduction as parent nodes. A Networked Errata (NER) model is used to extract knowledge points from the text converted from the audio in the course video, and these extracted knowledge points are designated as child nodes. The relationship between the child nodes and parent nodes is defined as a membership relationship. The missing parent nodes, child nodes, and edges between them are then added to the first knowledge graph.

[0120] Of course, in some embodiments, nodes and edges that meet the association requirements for the educational scenario and are missing in the first knowledge graph can also be added to the first knowledge graph based on the books and multimedia resources corresponding to the educational scenario.

[0121] Step (3): Correct the second knowledge graph to obtain the educational knowledge graph.

[0122] The elimination and addition operations in steps (1) and (2) may have some low accuracy. Therefore, after obtaining the second knowledge graph, the second knowledge graph is corrected and the corrected knowledge graph is used as the educational knowledge graph.

[0123] Obtaining an educational knowledge graph through addition and deletion requires verifying the accuracy of nodes and edges. Furthermore, since extracting data from books and multimedia resources relevant to educational scenarios may have significant limitations, missing nodes and edges need to be added. Both verifying the accuracy of nodes and edges and adding missing nodes and edges fall under the category of correcting the second knowledge graph.

[0124] In an exemplary embodiment, modifying the second knowledge graph may refer to the computer device modifying the second knowledge graph itself. The computer device stores rules for modifying the knowledge graph, thereby enabling it to modify the second knowledge graph according to these rules.

[0125] In an exemplary embodiment, modifying the second knowledge graph can also refer to feeding the second knowledge graph back to the reviewer, who then modifies it. The modified educational knowledge graph is then fed back to the computer device, which acquires the educational knowledge graph. Exemplarily, the reviewer may include, but is not limited to, the operation and maintenance personnel of the educational platform, the teachers and institutions teaching courses on the educational platform, and the students learning knowledge on the educational platform.

[0126] Each target knowledge point corresponds to at least one target knowledge point video, which refers to a knowledge point video that matches the target student. The number of target knowledge point videos corresponding to different target knowledge points may be the same or different. In an exemplary embodiment, the process of determining the set of target knowledge point videos that match the target student from candidate knowledge point videos based on at least one target knowledge point and the attribute characteristics of the target student includes the following steps A to C.

[0127] Step A: Determine at least one first knowledge point video corresponding to any target knowledge point from the candidate knowledge point videos, and determine at least one second knowledge point video that meets the screening criteria from the at least one first knowledge point video corresponding to any target knowledge point.

[0128] The first knowledge point video corresponding to any target knowledge point refers to the candidate knowledge point video whose knowledge point tag indicates that any target knowledge point. For example, since educational platforms typically have a large amount of data, the number of knowledge point videos determined by knowledge point is still quite large. Therefore, it is necessary to first perform a preliminary screening of at least one knowledge point video corresponding to any target knowledge point to determine at least one second knowledge point video that meets the screening criteria. The number of second knowledge point videos is less than the number of first knowledge point videos to improve the efficiency of subsequent determination of target knowledge point videos.

[0129] For example, the process of determining at least one second knowledge point video that meets the screening criteria from at least one first knowledge point video corresponding to any target knowledge point can be called a recall process.

[0130] The implementation method for determining at least one second knowledge point video that meets the screening criteria from at least one first knowledge point video corresponding to any target knowledge point can be set based on experience, and this application embodiment does not limit it in this way. For example, the first knowledge point video whose evaluation result meets the first condition among at least one first knowledge point video corresponding to any target knowledge point is regarded as the second knowledge point video that meets the screening criteria. The first knowledge point video meeting the first condition may mean that the score of the first knowledge point video ranks in the top reference quantity, or it may mean that the quality level of the first knowledge point video is excellent.

[0131] For example, the implementation of determining at least one second knowledge point video that meets the screening criteria from at least one first knowledge point video corresponding to any target knowledge point can also be as follows: inputting the attribute features of the target student into the first processing branch of the dual-tower model to obtain the first representation vector extracted by the first processing branch; inputting the video features of any first knowledge point video into the second processing branch of the dual-tower model to obtain the second representation vector extracted by the second processing branch; using the similarity between the first representation vector and the second representation vector as a measurement index for any first knowledge point video; and selecting the first knowledge point video among the videos that meets the second condition as the second knowledge point video that meets the screening criteria.

[0132] For example, the second condition for the measurement indicator can mean that the measurement indicator is not less than the measurement indicator threshold, or it can mean that the measurement indicator is the largest of the top T (T is an integer not less than 1) measurement indicators among the measurement indicators of each first knowledge point video.

[0133] For example, the dual-tower model can refer to the DSSM model (a dual-tower model). For example, the process of obtaining the metrics for the first knowledge point video based on the dual-tower model can be as follows: Figure 5 As shown, both processing branches of the dual-tower model include a DNN (Deep Neural Network) model. The DNN model can convert the embedded features corresponding to the input features into representation vectors. The attribute features of the target student are input into one processing branch of the dual-tower model. This branch first obtains the embedded features of the target student's attribute features, and then inputs the embedded features into the DNN model for processing to obtain the first representation vector. The video features of the first knowledge point video are input into the other processing branch of the dual-tower model. This branch first obtains the embedded features of the video features of the first knowledge point video, and then inputs the embedded features into the DNN model for processing to obtain the second representation vector. The similarity between the first and second representation vectors is calculated, and the calculated similarity is used as a metric for the first knowledge point video.

[0134] Step B: Based on the attribute characteristics of the target student and the video characteristics of at least one second knowledge point video, determine the matching degree between at least one second knowledge point video and the target student; take at least one second knowledge point video whose matching degree meets the matching condition as at least one target knowledge point video corresponding to any target knowledge point.

[0135] Step A can be understood as a coarse screening. After step A, videos for the second knowledge point are initially selected for any target knowledge point. During the initial screening process, because the number of videos for the first knowledge point is too large, it is difficult to use a deep learning model for screening, resulting in insufficient personalized fitting depth in the screening process. Therefore, in step B, the selected videos for the second knowledge point are further screened to determine the target knowledge point videos that match the target students for any target knowledge point.

[0136] For example, the second knowledge point videos may also contain competing videos of the same type, but the number is much smaller than that of the first knowledge point videos. Therefore, a deep learning model can be used for further filtering. In an exemplary embodiment, the process of determining the matching degree between at least one second knowledge point video and the target student based on the target student's attribute characteristics and the video characteristics of at least one second knowledge point video includes: inputting the target student's attribute characteristics and the video characteristics of any second knowledge point video into a deep learning model, obtaining the matching degree output by the deep learning model, and using the matching degree output by the deep learning model as the matching degree between any second knowledge point video and the target student. The higher the matching degree between any second knowledge point video and the target student, the better the match between the second knowledge point video and the target student, and the greater the target student's interest in learning any knowledge point video. Determining the matching degree by comprehensively considering the target student's attribute characteristics and the video characteristics of the knowledge point videos helps to ensure the reliability of the matching degree.

[0137] This application does not limit the type of deep learning model used; for example, a deep learning model can refer to the DeepFM model. After inputting the attribute features of the target student and the video features of any second knowledge point video into the DeepFM model, the DeepFM model can process the attribute features of the target student and the video features of any second knowledge point video to obtain shallow cross features and deep features. Shallow cross features (e.g., difficulty level, student's foundation, etc.) can be placed in the FM layer of the DeepFM model, and deep features (e.g., text features) can be placed in the deep layer of the DeepFM model. Through the processing of the FM layer and the deep layer, the matching degree is input. For example, the last layer of the deep model uses the sigmoid function, which can control the output value between 0 and 1, with a higher value representing a greater matching degree. This personalized matching is more refined than similarity calculation.

[0138] For example, the process of obtaining the matching degree based on the DeepFM model can be as follows: Figure 6As shown, the target student's attribute features and the video features of the second knowledge point video are used as input information for the DeepFM model. The target student's attribute features and the video features of the second knowledge point video are sparse features, including features related to multiple domains (e.g., domain i, domain j, domain m, etc.). Features related to multiple domains can also be called features related to multiple different aspects. In the DeepFM model, the sparse features are first converted into dense embedding features; then, shallow cross features are obtained and placed in the FM layer, and deep features are obtained and placed in the deep layer; the results obtained after processing by the FM layer and the deep layer are input into the output unit, and the output unit outputs the matching degree between the second knowledge point video and the target student.

[0139] For example, deep models can be trained using supervised training based on training data. The training data can be manually configured or summarized based on student learning feedback. The training data includes the attribute features of sample students, the video features of sample videos, and the matching degree labels between sample videos and sample students.

[0140] After obtaining the matching degree between each second knowledge point video and the target student, at least one second knowledge point video whose matching degree meets the matching condition is selected as at least one target knowledge point video for any given target knowledge point. The matching degree condition is set based on experience or can be flexibly adjusted according to the application scenario. For example, the matching degree condition could mean that the matching degree is among the top K (K is an integer not less than 1) largest matching degrees between each second knowledge point video and the target student. Another example is that the matching degree condition could mean that the matching degree is not less than a matching degree threshold, which is set based on experience.

[0141] Step C: Determine the target knowledge point video set based on at least one target knowledge point video corresponding to each target knowledge point.

[0142] Based on steps A and B, at least one target knowledge point video corresponding to each target knowledge point can be obtained, and then the set of at least one target knowledge point video corresponding to each target knowledge point is taken as the target knowledge point video set.

[0143] In step 204, the target knowledge point videos in the target knowledge point video set are spliced together to obtain at least one candidate course video. Each candidate course video is obtained by splicing together one target knowledge point video corresponding to each target knowledge point.

[0144] At least one candidate course video is an alternative video corresponding to the target student. Each candidate course video is obtained by splicing together videos of target knowledge points that match the target student. Selecting the target course video based on at least one candidate course video ensures the degree of matching between the target course video and the target student, and guarantees the reliability of the target course video.

[0145] In one possible implementation, the method of splicing the target knowledge point videos in the target knowledge point video set to obtain at least one candidate course video includes: determining the arrangement order of at least one target knowledge point, and splicing the target knowledge point videos in the target knowledge point video set based on the arrangement order to obtain at least one candidate course video, wherein any candidate course video is obtained by splicing one target knowledge point video corresponding to each target knowledge point in the arrangement order.

[0146] For example, the order of at least one target knowledge point can be determined based on the node corresponding to that target knowledge point in the educational knowledge graph. For example, each target knowledge point corresponds to at least one target knowledge point video. The nodes corresponding to each target knowledge point and the edges involved in those nodes are separately decomposed into subgraphs. The video features of the at least one target knowledge point video corresponding to each target knowledge point are bound to the subgraph. A GAT (Graph Attention Networks) model is used to fuse the video features and the relationships between knowledge points. Then, the ranking position of each node is predicted, i.e., a multi-classification model. Based on the position, the order of at least one target knowledge point can be determined. For example, the order prediction task utilizes the educational knowledge graph, graph algorithms (such as GraphSAGE, a graph neural network algorithm), GAT, etc., combined with temporal models (such as LSTM, Transformer, etc.).

[0147] After determining the arrangement order, the target knowledge point videos in the target knowledge point video set are spliced together based on the arrangement order to obtain at least one candidate course video. This application embodiment does not limit the specific splicing method, as long as it ensures that any candidate course video is obtained by splicing together one target knowledge point video corresponding to each target knowledge point in the arrangement order. If at least one target knowledge point corresponds to multiple target knowledge point videos, there will be multiple combination results, i.e., multiple candidate course videos.

[0148] In step 205, a target course video that meets the selection criteria is selected from at least one candidate course video. The target course video is the course video that the target student needs to learn.

[0149] In this embodiment, candidate knowledge point videos are used as materials. The target course videos to be learned by the target students are determined by comprehensively considering the learning needs and attribute characteristics of the target students. These target course videos are personalized course videos suitable for the target students. Each part of the target course video is matched with the target student, which helps to improve the target students' learning interest and learning effect on the target course videos, thereby increasing the number of times and duration of learning on the education platform, and improving the interaction rate between the target students and the education platform.

[0150] In an exemplary embodiment, the method for selecting a target course video that meets the selection criteria from at least one candidate course video can be as follows: the average matching degree between each target knowledge point video in at least one candidate course video and the target student is used as the measurement index of at least one candidate course video, and the candidate course video with the largest measurement index is selected as the target course video that meets the selection criteria.

[0151] In an exemplary embodiment, selecting a target course video that meets the selection criteria from at least one candidate course video can be achieved by: evaluating at least one candidate course video to obtain evaluation results for at least one candidate course video; and selecting the candidate course video whose evaluation results meet the first evaluation criteria as the target course video that meets the selection criteria. Evaluating at least one candidate course video allows for filtering of the candidate course videos based on the evaluation results, ensuring that the selected target course video is one with a better evaluation result, thus guaranteeing the reliability of the target course video.

[0152] The principle for evaluating each candidate course video is the same. Taking any candidate course video as an example, the process of evaluating any candidate course video and obtaining the evaluation result of any candidate course video is introduced. In an exemplary embodiment, the implementation process of evaluating any candidate course video and obtaining the evaluation result of any candidate course video includes: obtaining at least one of a first evaluation indicator, a second evaluation indicator, and a third evaluation indicator for any candidate course video; and determining the evaluation result of any candidate course video based on at least one of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator.

[0153] The first evaluation metric is derived from the text corresponding to each target knowledge point in any candidate course video, and is used to evaluate the rationality of the content summary in any candidate course video. The second evaluation metric is derived from each video frame in any candidate course video, and is used to evaluate the viewing smoothness of any candidate course video. The third evaluation metric is derived from the text corresponding to each target knowledge point in any candidate course video and each video frame in any candidate course video, and is used to evaluate the overall quality of any candidate course video.

[0154] For example, the text corresponding to each target knowledge point video in any candidate course video is input into a language model (e.g., BERT model) for encoding to obtain the encoding features of the text corresponding to each target knowledge point video. The encoding features of the text corresponding to each target knowledge point video are input into a feature processing model (e.g., LSTM model) in temporal order for processing to obtain text representation features. The text representation features are input into an evaluation task layer (e.g., softmax (a function) layer) for evaluation to obtain the first evaluation index.

[0155] For example, each video frame in any candidate course video is input into a video processing model (e.g., Vision Transformer) in chronological order to obtain video representation features. The video representation features are then input into the evaluation task layer for evaluation to obtain a second evaluation index.

[0156] For example, text representation features are obtained based on the text corresponding to each target knowledge point video in any candidate course video; video representation features are obtained based on each video frame in any candidate course video; the text representation features and video representation features are fused to obtain target representation features; the target representation features are input into the task layer for evaluation to obtain the third evaluation index.

[0157] For example, the model used to evaluate candidate course videos can be trained under supervised training data. The training data can include positive and negative sample data; positive sample data can refer to data related to course videos with high reliability, while negative sample data can refer to data related to course videos with low reliability. For example, the model used to evaluate candidate course videos can also be fine-tuned by accumulating feedback data.

[0158] For example, if the evaluation result of any candidate course video is determined based on any one of the first, second, and third evaluation indicators, then any one of the first, second, and third evaluation indicators can be directly used as the evaluation result of any candidate course video. If the evaluation result of any candidate course video is determined based on any two of the first, second, and third evaluation indicators, or based on all three of the first, second, and third evaluation indicators, then the weighted sum of the two or three indicators can be used to obtain the evaluation result of any candidate course video. The weights of the two or three indicators are set based on experience.

[0159] For example, taking the determination of the evaluation result of any candidate course video based on a first evaluation indicator, a second evaluation indicator, and a third evaluation indicator as an example, the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator can be weighted to obtain the evaluation result of any candidate course video. The weights of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator can be set based on experience. In this way, the evaluation result of any candidate course video, by comprehensively considering the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator, can use a video and text multimodal model to evaluate the candidate course video. The candidate course video can be comprehensively evaluated based on the rationality of the content presentation, the smoothness of viewing, and the overall quality, which helps to further ensure the reliability of the evaluation results of the candidate course video.

[0160] The evaluation result of any candidate course video is used to indicate the reliability of using any candidate course video as a course video for target students to learn. This application embodiment does not limit the representation of the evaluation result of any candidate course video. For example, the evaluation result of any candidate course video can be represented as a score, with a higher score indicating higher reliability. Alternatively, the evaluation result of any candidate course video can also be represented as a quality level, with a higher quality level indicating higher reliability.

[0161] A candidate course video whose evaluation result meets the first evaluation condition refers to the candidate course video that is most suitable for the target student among at least one candidate course video. The evaluation result meeting the first evaluation condition can be set based on experience or flexibly adjusted according to the application scenario; this embodiment does not limit this. For example, if the evaluation result of a candidate course video is represented by a score positively correlated with reliability, meeting the first evaluation condition could mean the evaluation result is the highest or greater than a score threshold, which is set based on experience. For example, if the evaluation result of a candidate course video is represented by a quality level, including excellent, good, and poor, meeting the first evaluation condition could mean the evaluation result is excellent.

[0162] In an exemplary embodiment, after determining the target course video, the method further includes: playing the target course video in response to the target student's learning instruction for the target course video.

[0163] In an exemplary embodiment, playing the target course video may refer to playing the target knowledge point videos of the target course video based on the target playback speed of each target knowledge point video in the target course video.

[0164] In an exemplary embodiment, before playing the target course video, the original playback speed of each target knowledge point video in the target course video can be edited based on the attribute characteristics of the target students to obtain the target playback speed for each target knowledge point video in the target course video. In this case, playing the target course video means playing each target knowledge point video in the target course video based on the target playback speed of each target knowledge point video. This playback method has a high degree of matching with the target students and is more conducive to improving the target students' learning interest and learning effect of the target course video.

[0165] For example, the process of editing the original playback speed of each target knowledge point video in the target course video based on the attribute characteristics of the target students can be processed by certain rules. For example, skipping short videos of knowledge points that are well-versed or speeding up the playback speed of videos of knowledge points that are well-versed, speeding up the playback speed of videos of knowledge points with relatively good foundations, and slowing down the playback speed of videos of knowledge points with relatively weak foundations.

[0166] In an exemplary embodiment, after determining the target course video, the method further includes: obtaining feedback information from the target student regarding the target course video, the feedback information being used to indicate the determination effect of the target course video; and updating the course video to be studied by the target student based on the feedback information.

[0167] For example, obtaining feedback from target students on target course videos can be achieved through questionnaires to collect feedback, or by assessing students' learning outcomes and obtaining feedback based on the assessment results. Obtaining feedback is to record students' learning progress on the target course videos, such as viewing time, feedback on each knowledge point, overall course experience feedback, and simple assessments to understand their mastery of the course videos. This feedback information accumulates relevant data to facilitate the iterative development of the AI course creation algorithm, which in turn updates the course videos available for target students. For example, based on the feedback information, the target students' attribute characteristics and knowledge point learning needs can be updated. Then, based on the new attribute characteristics and new knowledge point learning needs, new course videos can be determined for the target students, ensuring the real-time nature of the course video selection.

[0168] For example, the process of determining the target course videos can be as follows: Figure 7As shown, the target course videos can be determined based on a personalized course customization system. This system includes the following components: a short video resource library (containing features), a student dynamic feature database, an educational knowledge graph application layer, a course creation AI algorithm module, and a student feedback system. Specifically, the short video resource library primarily maintains and updates the fine-grained breakdown system of course video knowledge points; the educational knowledge graph is maintained and updated independently, belonging to the application layer; the student dynamic feature database primarily maintains students' attribute characteristics and knowledge point learning needs; the course creation AI algorithm module is mainly used to create personalized course videos for target students based on a more controllable pipeline model; and the student feedback system is mainly used to collect student feedback information on the course videos.

[0169] Candidate knowledge point videos are extracted from the database, such as a11, a22, etc., from the video library for knowledge point A; b11, b22, etc., from the video library for knowledge point B; c11, c22, etc., from the video library for knowledge point C; and candidate knowledge point videos from other skill or learning need knowledge bases. The attribute characteristics and learning needs of multiple students are obtained, such as the attribute characteristics and learning needs of student 1 (Java beginner), student 2 (Photoshop advanced design), and student 3 (NET intermediate), etc. An educational knowledge graph is obtained. Based on the candidate knowledge point videos, students' attribute characteristics and learning needs, and the educational knowledge graph, personalized course videos are determined for students using a personalized course customization algorithm.

[0170] For example, the process of a personalized course customization algorithm includes: 1. Recalling based on needs and knowledge point graphs; 2. Homogeneously sorting the short videos according to the student's profile and the knowledge point attributes of the short videos to select the top K (the K most matching videos); 3. Reorganizing the short videos in chronological order according to the knowledge point structure to obtain candidate long video courses; 4. Evaluating the quality of the courses; 5. Personalizing the playback speed of each component short video.

[0171] After determining the course videos for students, the process also includes student learning, self-testing of mastery or student feedback, updating student attributes and learning needs for knowledge points, recording of results logs, data organization, and iterating on personalized course customization algorithms.

[0172] For example, determining the framework of the target course videos can be as follows: Figure 8As shown, pre-recorded long video courses are uploaded to the database. AI algorithms segment these into minute-level short videos based on knowledge point granularity. Short video materials are retrieved based on learning needs, educational knowledge graph data, and the algorithm. The AI creation model sorts the materials categorized by knowledge points and selects the top K, recombining the materials to output multiple candidate creation courses. These courses undergo quality assessment to obtain the optimal customized course for students. Upon completion of learning, students receive feedback on the short videos, including overall quality feedback and assessments of their knowledge point mastery. Student profiles (attribute features) are updated, accumulating feedback and assessment data from the entire platform, and the model is iterated periodically. Whenever the model iterates or the student profile is updated, the customized course for each student needs to be updated synchronously.

[0173] For example, steps 201 to 205 above can be considered as a process of obtaining target course videos in a pipeline manner, and the embodiments of this application are not limited thereto. In another possible implementation, the process of determining the target course video can also be as follows: inputting the video features of candidate knowledge point videos, the knowledge point learning needs and attribute features of target students into the course video determination model, and obtaining the target course video output by the course video determination model, which is obtained by splicing at least one target knowledge point video that matches the target student from the candidate knowledge point videos. The course video determination model is trained using an end-to-end training method based on the video features of sample knowledge point videos, the knowledge point learning needs of sample students, the attribute features of sample students, and the standard course video corresponding to the sample students. The standard course video is obtained by splicing at least one sample knowledge point video. This method of determining the target course video is a multi-task joint modeling method, which has good results under the premise of rich data accumulation, thorough business understanding, and relatively mature algorithm technology.

[0174] Students have limited energy and time, varying levels of prior knowledge, and differing levels of mastery of different knowledge points. Teachers also have inconsistent understandings of each specific knowledge point, and their explanations and levels of difficulty differ. Given these factors, learning from one or two video series (existing long video courses) is not the optimal learning solution for most students. This is because video learning, like classroom lectures, is a standard approach for most students, leading to low learning efficiency for students with strong foundations and inadequate understanding for those with weaker foundations. Furthermore, each teacher's understanding of all knowledge points is not consistent, resulting in inconsistent quality of short videos for each knowledge point (which varies subjectively), and the current finest granularity is at the sub-lesson video level; however, students cannot make optimal choices at this short video granularity. This application's implementation addresses these problems by refining the matching of online courses and learning needs down to the minute-level short video of each knowledge point, while also considering personalized user needs, thus optimizing the learning for each student and improving their learning efficiency and effectiveness.

[0175] This application proposes an online education scenario based on AI algorithms to provide optimal personalized courses for each student. It refines the granularity of matching teaching and learning needs from the level of large lectures to the level of short videos focusing on specific knowledge points. It breaks down the limitations of educational resources, changing the model of one teacher teaching many students to a model of many teachers teaching one student in a small class, improving individual learning efficiency and effectiveness while fully utilizing educational resources. For example, it breaks down courses into minute-level short videos using NLP (Natural Language Processing), CV (Computer Vision), and multimodal algorithms, and then reorganizes the courses using search, recommendation, recall, and ranking technologies to obtain the most suitable courses for each individual learner.

[0176] First, speech recognition and NER technologies are used to identify the knowledge points involved in the course. Then, multimodal technologies (such as CLIP (Contrastive Language-Image Pre-training) and DALL-E (a text-to-image generative model) are used to segment the video into short videos of a minute's length. Next, the GAT model is used to represent the structure of knowledge points in the educational knowledge graph, incorporating course and student features. A model is constructed based on DSSM, trained using historical data, and candidate creative materials are recalled. The DeepFM model is used to rank the recalled competitor short videos, selecting the top K for later reorganization and creative material selection. The multimodal model predicts the order of the short videos. The reorganized courses are also evaluated using a multimodal model to assess video quality and select the optimal output. Finally, the videos are edited based on students' knowledge point mastery, reducing course time costs. Finally, the data accumulated from the student learning process and the established assessment mechanism is archived and organized, and the AI creation model for the courses is regularly optimized. Conversely, the optimized model optimizes the customized courses for each student's learning needs.

[0177] This application's embodiments can be used as platform systems, devices, or programs for learning knowledge through recorded video materials, improving the learning efficiency and experience for each user and saving learning time. For course creators using tools and devices, by providing effective guidance and attribution evaluation, creation efficiency and the quality of works can be effectively improved.

[0178] For example, on online education platforms, numerous educational institutions and teachers provide online classes to a vast number of students with internet access through pre-recorded or live streams. The diversity of online courses far exceeds the diversity of student needs. The core function of the platform is to match student learning needs with the courses offered by teachers; the higher the level of granular matching, the better the user experience. Specifically, for instance, a platform might offer a "[Java Introduction]" technical series, with multiple sub-courses (environment configuration, basic characters, operators, classes, etc.). Each sub-course is a relatively long video course (often an hour long) consisting of multiple fine-grained knowledge points, and multiple teachers or institutions teach the "[Java Introduction]" series. The platform obtains permission to recreate the pre-recorded course videos, cutting the "[Java Introduction]" content into shorter videos based on the granularity of knowledge points (e.g., strings, ++, etc.). Student A's profile indicates a need for introductory Java learning. They have already grasped string concepts and enjoy courses taught by renowned instructors. Based on the student's foundation and preferences, the system first uses educational knowledge graphs and algorithmic retrieval techniques to retrieve short videos related to Java concepts. AI algorithm-created courses skip already mastered concepts; for each concept, multiple short videos are available, prioritizing those from renowned instructors. After completing a lesson, a simple course experience feedback and knowledge point assessment are provided to evaluate course quality and update student A's information. Before the next time student A participates, the algorithm is iteratively updated to continuously improve the course, maintaining optimal performance.

[0179] Creators can use a course recording assistance system or device to create a new lesson. The educational knowledge graph can help quickly determine the scope of knowledge points and formulate a plan based on that scope. After the video is recorded and segmented, each short video of a knowledge point is scored by AI, and attribution analysis is performed to identify areas for improvement and guide the creator to optimize.

[0180] The method for determining course videos provided in this application determines the target course videos to be learned by the target students based on their knowledge point learning needs and attribute characteristics. Each part of the target course video is a target knowledge point video that matches the target student from the candidate knowledge point videos. Based on this method, each part of the target course video has a high degree of matching with the student. Using the target course video as the course video to be learned by the student is conducive to improving the student's learning interest and learning effect, thereby increasing the interaction rate between the student and the education platform.

[0181] See Figure 9 This application provides an apparatus for determining course videos, the apparatus comprising:

[0182] The first acquisition module 901 is used to acquire the initial course video recorded by the teacher, and to segment the initial course video according to the knowledge point granularity to obtain candidate knowledge point videos;

[0183] The second acquisition module 902 is used to acquire the knowledge point learning needs and attribute characteristics of the target student. The knowledge point learning needs are used to indicate the knowledge points that the target student expects to learn, and the attribute characteristics are used to identify the target student.

[0184] The determination module 903 is used to determine at least one target knowledge point that matches the target student based on the target student's knowledge point learning needs; and to determine a set of target knowledge point videos that matches the target student from candidate knowledge point videos based on at least one target knowledge point and the attribute characteristics of the target student. The set of target knowledge point videos includes at least one target knowledge point video corresponding to each target knowledge point.

[0185] The splicing module 904 is used to splice the target knowledge point videos in the target knowledge point video set to obtain at least one candidate course video. Each candidate course video is obtained by splicing one target knowledge point video corresponding to each target knowledge point.

[0186] The selection module 905 is used to select a target course video that meets the selection criteria from at least one candidate course video. The target course video is the course video that the target student needs to learn.

[0187] In one possible implementation, the determining module 903 is configured to: determine at least one first knowledge point video corresponding to any target knowledge point from the candidate knowledge point videos; determine at least one second knowledge point video that meets the filtering conditions from the at least one first knowledge point video corresponding to any target knowledge point; determine the matching degree between the at least one second knowledge point video and the target student based on the attribute characteristics of the target student and the video characteristics of the at least one second knowledge point video; designate the at least one second knowledge point video that meets the matching conditions as the at least one target knowledge point video corresponding to any target knowledge point; and determine the target knowledge point video set based on the at least one target knowledge point video corresponding to each target knowledge point.

[0188] In one possible implementation, a selection module 905 is used to obtain at least one of a first evaluation indicator, a second evaluation indicator, and a third evaluation indicator for any candidate course video; and to determine the evaluation result of any candidate course video based on at least one of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator; and to select at least one candidate course video whose evaluation result satisfies the first evaluation condition as the target course video that satisfies the selection condition; wherein, the first evaluation indicator is obtained based on the text corresponding to each target knowledge point video in any candidate course video, and the first evaluation indicator is used to evaluate the rationality of the content summary of any candidate course video; the second evaluation indicator is obtained based on each video frame in any candidate course video, and the second evaluation indicator is used to evaluate the viewing smoothness of any candidate course video; the third evaluation indicator is obtained based on the text corresponding to each target knowledge point video in any candidate course video and each video frame in any candidate course video, and the third evaluation indicator is used to evaluate the overall quality of any candidate course video.

[0189] In one possible implementation, module 903 is used to obtain an educational knowledge graph, which includes nodes corresponding to knowledge points and edges between nodes corresponding to knowledge points; based on the learning needs of the target student and the educational knowledge graph, at least one target knowledge point matching the target student is determined.

[0190] In one possible implementation, a determining module 903 is used to obtain an initial knowledge graph; remove nodes and edges in the initial knowledge graph whose relevance to the educational scenario does not meet the relevance requirements to obtain a first knowledge graph; based on at least one of books corresponding to the educational scenario and multimedia resources corresponding to the educational scenario, add nodes and edges in the first knowledge graph whose relevance to the educational scenario meets the relevance requirements but are missing in the first knowledge graph to obtain a second knowledge graph; and modify the second knowledge graph to obtain an educational knowledge graph.

[0191] In one possible implementation, the splicing module 904 is used to determine the arrangement order of at least one target knowledge point; based on the arrangement order, the target knowledge point videos in the target knowledge point video set are spliced to obtain at least one candidate course video, wherein any candidate course video is obtained by splicing a target knowledge point video corresponding to each target knowledge point in the arrangement order.

[0192] In one possible implementation, the first acquisition module 901 is used to acquire the audio data corresponding to the initial course video, convert the audio data into text, extract knowledge points from the text, determine the classification result of the video frame based on the video frame of the initial course video and the text corresponding to the video frame, the classification result of the video frame is used to indicate the knowledge point corresponding to the video frame in the extracted knowledge points, based on the classification result of each video frame in the initial course video, segment the initial course video according to the granularity of knowledge points to obtain segmented videos, and obtain candidate knowledge point videos based on the segmented videos.

[0193] In one possible implementation, the first acquisition module 901 is used to acquire the evaluation result of the segmented video based on the video features of the segmented video; and to remove videos in the segmented video whose evaluation results do not meet the second evaluation condition to obtain candidate knowledge point videos.

[0194] In one possible implementation, the video features of the segmented video include at least one of the following: the features of the teacher who recorded the original course video corresponding to the segmented video, the features of the institution to which the teacher who recorded the original course video corresponding to the segmented video belongs, and the features of the content of the segmented video.

[0195] In one possible implementation, the device further includes:

[0196] The editing module is used to edit the original playback speed of each target knowledge point video in the target course video based on the attribute characteristics of the target students, so as to obtain the target playback speed of each target knowledge point video in the target course video.

[0197] The playback module is used to respond to the learning instructions of the target students for the target course videos and play the videos of each target knowledge point in the target course videos according to the target playback speed of each target knowledge point video.

[0198] In one possible implementation, the device further includes:

[0199] The update module is used to obtain feedback information from target students regarding the target course videos. This feedback information is used to indicate the effectiveness of the target course videos. Based on the feedback information, the course videos to be studied by the target students are updated.

[0200] The apparatus for determining course videos provided in this application determines the target course videos to be learned by the target students based on their knowledge point learning needs and attribute characteristics. Each part of the target course video is a target knowledge point video that matches the target student among the candidate knowledge point videos. Based on this method, each part of the target course video has a high degree of matching with the student. Using the target course video as the course video to be learned by the student is conducive to improving the student's learning interest and learning effect, thereby increasing the interaction rate between the student and the education platform.

[0201] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and their specific implementation process can be found in the method embodiments, which will not be repeated here.

[0202] In an exemplary embodiment, a computer device is also provided, comprising a processor and a memory storing at least one computer program. The at least one computer program is loaded and executed by one or more processors to enable the computer device to implement any of the methods described above for determining course videos. The computer device can be a server or a terminal. The structures of servers and terminals will be described below.

[0203] Figure 10 This is a schematic diagram of a server structure provided in an embodiment of this application. The server can vary significantly due to differences in configuration or performance. It may include one or more Central Processing Units (CPUs) 1001 and one or more memories 1002. The one or more memories 1002 store at least one computer program, which is loaded and executed by the one or more processors 1001 to enable the server to implement the method for determining course videos provided in the above-described method embodiments. Of course, the server may also have wired or wireless network interfaces, a keyboard, and input / output interfaces for input and output. The server may also include other components for implementing device functions, which will not be elaborated upon here.

[0204] Figure 11 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application. The terminal can be: a PC (e.g., a laptop computer, desktop computer, etc.), a mobile phone, a smartphone, a PDA, a wearable device, a PPC, a tablet computer, a smart car infotainment system, a smart TV, a smart speaker, a smartwatch, an in-vehicle terminal, etc. The terminal may also be referred to as user equipment, a portable terminal, a laptop terminal, a desktop terminal, or other names.

[0205] Typically, a terminal includes a processor 1501 and a memory 1502.

[0206] Processor 1501 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 1501 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 1501 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 1501 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the screen. In some embodiments, processor 1501 may also include an AI processor, which is used to handle computational operations related to machine learning.

[0207] The memory 1502 may include one or more computer-readable storage media, which may be non-transitory. The memory 1502 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 1502 are used to store at least one instruction, which is executed by the processor 1501 to cause the terminal to implement the method for determining course videos provided in the method embodiments of this application.

[0208] In some embodiments, the terminal may also optionally include: a peripheral device interface 1503 and at least one peripheral device. The processor 1501, memory 1502, and peripheral device interface 1503 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 1503 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of: a radio frequency circuit 1504, a display screen 1505, a camera assembly 1506, an audio circuit 1507, and a power supply 1508.

[0209] Peripheral device interface 1503 can be used to connect at least one I / O (Input / Output) related peripheral device to processor 1501 and memory 1502. In some embodiments, processor 1501, memory 1502 and peripheral device interface 1503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of processor 1501, memory 1502 and peripheral device interface 1503 can be implemented on separate chips or circuit boards, which is not limited in this embodiment.

[0210] The radio frequency (RF) circuit 1504 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The RF circuit 1504 communicates with communication networks and other communication devices via electromagnetic signals. The RF circuit 1504 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals back into electrical signals. Optionally, the RF circuit 1504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, etc. The RF circuit 1504 can communicate with other terminals via at least one wireless communication protocol. This wireless communication protocol includes, but is not limited to: metropolitan area networks (MANs), various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks (WLANs), and / or WiFi (Wireless Fidelity) networks. In some embodiments, the RF circuit 1504 may also include circuitry related to NFC (Near Field Communication), which is not limited in this application.

[0211] Display screen 1505 is used to display a UI (User Interface). This UI may include graphics, text, icons, videos, and any combination thereof. When display screen 1505 is a touch display screen, it also has the ability to collect touch signals on or above its surface. These touch signals can be input as control signals to processor 1501 for processing. In this case, display screen 1505 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard. In some embodiments, display screen 1505 can be a single screen, disposed on the front panel of the terminal; in other embodiments, display screen 1505 can be at least two screens, disposed on different surfaces of the terminal or in a folded design; in other embodiments, display screen 1505 can be a flexible display screen, disposed on a curved or folded surface of the terminal. Furthermore, display screen 1505 can be configured as a non-rectangular, irregular shape, i.e., a non-rectangular screen. Display screen 1505 can be made of materials such as LCD (Liquid Crystal Display) or OLED (Organic Light-Emitting Diode).

[0212] The camera assembly 1506 is used to acquire images or videos. Optionally, the camera assembly 1506 includes a front-facing camera and a rear-facing camera. Typically, the front-facing camera is located on the front panel of the terminal, and the rear-facing camera is located on the back of the terminal. In some embodiments, there are at least two rear-facing cameras, which are any one of a main camera, a depth-sensing camera, a wide-angle camera, and a telephoto camera, to achieve background blurring by fusion of the main camera and the depth-sensing camera, panoramic shooting by fusion of the main camera and the wide-angle camera, VR (Virtual Reality) shooting, or other fusion shooting functions. In some embodiments, the camera assembly 1506 may also include a flash. The flash can be a single-color temperature flash or a dual-color temperature flash. A dual-color temperature flash refers to a combination of a warm-light flash and a cool-light flash, which can be used for light compensation at different color temperatures.

[0213] The audio circuit 1507 may include a microphone and a speaker. The microphone is used to collect sound waves from the user and the environment, converting the sound waves into electrical signals that are input to the processor 1501 for processing, or input to the radio frequency circuit 1504 for voice communication. For stereo sound acquisition or noise reduction purposes, multiple microphones may be used, each positioned at a different location on the terminal. The microphone may also be an array microphone or an omnidirectional microphone. The speaker is used to convert electrical signals from the processor 1501 or the radio frequency circuit 1504 into sound waves. The speaker may be a conventional diaphragm speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can convert electrical signals not only into audible sound waves but also into inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 1507 may also include a headphone jack.

[0214] Power supply 1508 is used to power the various components in the terminal. Power supply 1508 can be AC power, DC power, a disposable battery, or a rechargeable battery. When power supply 1508 includes a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery can also be used to support fast charging technology.

[0215] In some embodiments, the terminal further includes one or more sensors 1509. The one or more sensors 1509 include, but are not limited to: an acceleration sensor 1510, a gyroscope sensor 1511, a pressure sensor 1512, an optical sensor 1513, and a proximity sensor 1514.

[0216] Accelerometer 1510 can detect the magnitude of acceleration along the three coordinate axes of a coordinate system established by the terminal. For example, accelerometer 1510 can be used to detect the components of gravitational acceleration along the three coordinate axes. Processor 1501 can control display screen 1505 to display the user interface in either a landscape or portrait view based on the gravitational acceleration signal acquired by accelerometer 1510. Accelerometer 1510 can also be used for games or for acquiring user motion data.

[0217] The gyroscope sensor 1511 can detect the terminal's orientation and rotation angle. The gyroscope sensor 1511 can work in conjunction with the accelerometer sensor 1510 to collect the user's 3D movements on the terminal. Based on the data collected by the gyroscope sensor 1511, the processor 1501 can perform the following functions: motion sensing (e.g., changing the UI based on the user's tilt), image stabilization during shooting, game control, and inertial navigation.

[0218] The pressure sensor 1512 can be disposed on the side bezel of the terminal and / or the lower layer of the display screen 1505. When the pressure sensor 1512 is disposed on the side bezel of the terminal, it can detect the user's grip signal on the terminal, and the processor 1501 can perform left / right hand recognition or quick operation based on the grip signal collected by the pressure sensor 1512. When the pressure sensor 1512 is disposed on the lower layer of the display screen 1505, the processor 1501 can control the operable controls on the UI interface based on the user's pressure operation on the display screen 1505. The operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.

[0219] Optical sensor 1513 is used to collect ambient light intensity. In one embodiment, processor 1501 can control the display brightness of display screen 1505 based on the ambient light intensity collected by optical sensor 1513. Specifically, when the ambient light intensity is high, the display brightness of display screen 1505 is increased; when the ambient light intensity is low, the display brightness of display screen 1505 is decreased. In another embodiment, processor 1501 can also dynamically adjust the shooting parameters of camera assembly 1506 based on the ambient light intensity collected by optical sensor 1513.

[0220] The proximity sensor 1514, also known as a distance sensor, is typically installed on the front panel of the terminal. The proximity sensor 1514 is used to detect the distance between the user and the front of the terminal. In one embodiment, when the proximity sensor 1514 detects that the distance between the user and the front of the terminal is gradually decreasing, the processor 1501 controls the display screen 1505 to switch from a screen-on state to a screen-off state; when the proximity sensor 1514 detects that the distance between the user and the front of the terminal is gradually increasing, the processor 1501 controls the display screen 1505 to switch from a screen-off state to a screen-on state.

[0221] Those skilled in the art will understand that Figure 11 The structure shown does not constitute a limitation on the terminal and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0222] In an exemplary embodiment, a computer-readable storage medium is also provided, which stores at least one computer program that is loaded and executed by a processor of a computer device to enable the computer to implement any of the methods described above for determining course videos.

[0223] In one possible implementation, the aforementioned computer-readable storage medium may be a read-only memory (ROM), a random access memory (RAM), a compact disc read-only memory (CD-ROM), magnetic tape, floppy disk, and optical data storage device, etc.

[0224] In an exemplary embodiment, a computer program product is also provided, comprising a computer program or computer instructions that are loaded and executed by a processor to enable a computer to implement any of the methods described above for determining course videos.

[0225] It should be noted that all information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.), and signals involved in this application have been authorized by the user or fully authorized by all parties, and the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions. For example, the knowledge point learning needs and attribute characteristics involved in this application were obtained with full authorization.

[0226] The terms "first," "second," etc., used in this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. The implementations described in the above exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application.

[0227] It should be understood that "multiple" as used in this article refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects have an "or" relationship.

[0228] The above description is merely an exemplary embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the protection scope of this application.

Claims

1. A method for determining course videos, characterized in that, The method includes: Obtain the initial course video recorded by the teacher, and segment the initial course video according to the knowledge point granularity to obtain candidate knowledge point videos; The knowledge point learning needs and attribute characteristics of the target student are obtained. The knowledge point learning needs are used to indicate the knowledge points that the target student expects to learn, and the attribute characteristics are used to identify the target student. Obtain an educational knowledge graph, which includes nodes corresponding to knowledge points and edges between nodes corresponding to knowledge points. Based on the target student's knowledge point learning needs and the educational knowledge graph, at least one target knowledge point matching the target student is determined; Based on the at least one target knowledge point and the attribute characteristics of the target student, a set of target knowledge point videos matching the target student is determined from the candidate knowledge point videos. The set of target knowledge point videos includes at least one target knowledge point video corresponding to each target knowledge point. In the educational knowledge graph, the nodes corresponding to each target knowledge point and the edges involved in each target knowledge point are split into subgraphs. The video features of at least one target knowledge point video corresponding to each target knowledge point are bound to the subgraphs. The video features and the interrelationships between each target knowledge point are fused through a graph attention network. The ranking position of each node is predicted. The arrangement order of the at least one target knowledge point is determined according to the ranking position of each node. Based on the arrangement order, the target knowledge point videos in the target knowledge point video set are spliced together to obtain at least one candidate course video. Each candidate course video is obtained by splicing together a target knowledge point video corresponding to each target knowledge point according to the arrangement order. Select a target course video that meets the selection criteria from the at least one candidate course video, wherein the target course video is the course video that the target student is to learn.

2. The method according to claim 1, characterized in that, The step of determining a set of target knowledge point videos matching the target student from the candidate knowledge point videos based on the at least one target knowledge point and the attribute characteristics of the target student includes: From the candidate knowledge point videos, determine at least one first knowledge point video corresponding to any target knowledge point, and from the at least one first knowledge point video corresponding to any target knowledge point, determine at least one second knowledge point video that meets the filtering conditions. Based on the attribute characteristics of the target student and the video characteristics of the at least one second knowledge point video, the matching degree between the at least one second knowledge point video and the target student is determined; the at least one second knowledge point video whose matching degree meets the matching condition is taken as at least one target knowledge point video corresponding to any target knowledge point. The target knowledge point video set is determined based on at least one target knowledge point video corresponding to each target knowledge point.

3. The method according to claim 1, characterized in that, The step of selecting target course videos that meet the selection criteria from the at least one candidate course videos includes: Obtain at least one of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator for any candidate course video, and determine the evaluation result of the candidate course video based on at least one of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator. The candidate course video whose evaluation result meets the first evaluation condition is selected as the target course video that meets the selection condition. The first evaluation metric is obtained based on the text corresponding to each target knowledge point video in any candidate course video, and is used to evaluate the rationality of the content summary in any candidate course video; the second evaluation metric is obtained based on each video frame in any candidate course video, and is used to evaluate the viewing smoothness of any candidate course video; the third evaluation metric is obtained based on the text corresponding to each target knowledge point video in any candidate course video and each video frame in any candidate course video, and is used to evaluate the overall quality of any candidate course video.

4. The method according to claim 1, characterized in that, The acquisition of the educational knowledge graph includes: Obtain an initial knowledge graph; remove nodes and edges in the initial knowledge graph that do not meet the association requirements with the education scenario to obtain a first knowledge graph; Based on at least one of the books corresponding to the educational scenario and the multimedia resources corresponding to the educational scenario, nodes and edges that meet the association requirements with the educational scenario and are missing in the first knowledge graph are added to the first knowledge graph to obtain a second knowledge graph. The second knowledge graph is modified to obtain the educational knowledge graph.

5. The method according to any one of claims 1-4, characterized in that, The process of segmenting the initial course videos according to knowledge point granularity to obtain candidate knowledge point videos includes: Obtain the audio data corresponding to the initial course video, and convert the audio data into text; Extract the knowledge points from the text; Based on the video frames of the initial course video and the text corresponding to the video frames, the classification result of the video frames is determined, and the classification result of the video frames is used to indicate the knowledge points corresponding to the video frames in the extracted knowledge points. Based on the classification results of each video frame in the initial course video, the initial course video is segmented according to the knowledge point granularity to obtain segmented videos; based on the segmented videos, the candidate knowledge point videos are obtained.

6. The method according to claim 5, characterized in that, The step of obtaining the candidate knowledge point video based on the segmented video includes: Based on the video features of the segmented video, the evaluation result of the segmented video is obtained; Videos whose evaluation results do not meet the second evaluation condition are removed from the segmented videos to obtain the candidate knowledge point videos.

7. The method according to claim 6, characterized in that, The video features of the segmented video include at least one of the following: the features of the teacher who recorded the initial course video corresponding to the segmented video, the features of the institution to which the teacher who recorded the initial course video corresponding to the segmented video belongs, and the features of the content of the segmented video.

8. The method according to any one of claims 1-4, characterized in that, After selecting a target course video that meets the selection criteria from the at least one candidate course video, the method further includes: Based on the attribute characteristics of the target students, the original playback speed of each target knowledge point video in the target course video is edited to obtain the target playback speed of each target knowledge point video in the target course video. In response to the learning instruction of the target student for the target course video, the target knowledge point videos in the target course video are played according to the target playback speed of each target knowledge point video.

9. The method according to any one of claims 1-4, characterized in that, After selecting a target course video that meets the selection criteria from the at least one candidate course video, the method further includes: Obtain feedback information from the target students regarding the target course video; the feedback information is used to indicate the effectiveness of the target course video. Based on the feedback information, the course videos to be studied by the target student are updated.

10. An apparatus for determining course videos, characterized in that, The device includes: The first acquisition module is used to acquire the initial course video recorded by the teacher, and to segment the initial course video according to the knowledge point granularity to obtain candidate knowledge point videos; The second acquisition module is used to acquire the knowledge point learning needs and attribute characteristics of the target student. The knowledge point learning needs are used to indicate the knowledge points that the target student expects to learn, and the attribute characteristics are used to identify the target student. The determination module is used to acquire an educational knowledge graph, which includes nodes corresponding to knowledge points and edges between nodes corresponding to knowledge points; based on the learning needs of the target student and the educational knowledge graph, determine at least one target knowledge point that matches the target student; based on the at least one target knowledge point and the attribute characteristics of the target student, determine a set of target knowledge point videos that match the target student from the candidate knowledge point videos, the set of target knowledge point videos including at least one target knowledge point video corresponding to each target knowledge point; The stitching module is used to split the nodes corresponding to each target knowledge point and the edges involved in each target knowledge point into subgraphs in the educational knowledge graph; bind the video features of at least one target knowledge point video corresponding to each target knowledge point to the subgraphs; fuse the video features and the interrelationships between each target knowledge point through a graph attention network; predict the ranking position of each node; determine the arrangement order of the at least one target knowledge point based on the ranking position of each node; and stitch the target knowledge point videos in the target knowledge point video set based on the arrangement order to obtain at least one candidate course video. Each candidate course video is obtained by stitching together one target knowledge point video corresponding to each target knowledge point according to the arrangement order. The selection module is used to select a target course video that meets the selection criteria from the at least one candidate course video, wherein the target course video is the course video to be studied by the target student.

11. The apparatus according to claim 10, characterized in that, The determining module is used to determine at least one first knowledge point video corresponding to any target knowledge point from the candidate knowledge point videos, and to determine at least one second knowledge point video that meets the filtering conditions from the at least one first knowledge point video corresponding to any target knowledge point. Based on the attribute characteristics of the target student and the video characteristics of the at least one second knowledge point video, the matching degree between the at least one second knowledge point video and the target student is determined; At least one second knowledge point video whose matching degree meets the matching condition is taken as at least one target knowledge point video corresponding to any target knowledge point. The target knowledge point video set is determined based on at least one target knowledge point video corresponding to each target knowledge point.

12. The apparatus according to claim 10, characterized in that, The selection module is configured to acquire at least one of a first evaluation indicator, a second evaluation indicator, and a third evaluation indicator for any candidate course video; determine the evaluation result of any candidate course video based on at least one of the first evaluation indicator, the second evaluation indicator, and the third evaluation indicator; and select candidate course videos whose evaluation results satisfy the first evaluation condition as target course videos that satisfy the selection condition. The first evaluation indicator is obtained based on the text corresponding to each target knowledge point video in any candidate course video, and is used to evaluate the rationality of the content presentation in any candidate course video. The second evaluation indicator is obtained based on each video frame in any candidate course video, and is used to evaluate the viewing smoothness of any candidate course video. The third evaluation indicator is obtained based on the text corresponding to each target knowledge point video in any candidate course video and each video frame in any candidate course video, and is used to evaluate the overall quality of any candidate course video.

13. The apparatus according to claim 10, characterized in that, The determining module is used to obtain an initial knowledge graph; remove nodes and edges in the initial knowledge graph that do not meet the association requirements with the educational scenario to obtain a first knowledge graph; and add nodes and edges in the first knowledge graph that meet the association requirements with the educational scenario and are missing in the first knowledge graph based on at least one of the books corresponding to the educational scenario and the multimedia resources corresponding to the educational scenario to obtain a second knowledge graph. The second knowledge graph is modified to obtain the educational knowledge graph.

14. The apparatus according to any one of claims 10-13, characterized in that, The first acquisition module is used to acquire the audio data corresponding to the initial course video, convert the audio data into text, extract the knowledge points from the text, determine the classification result of the video frame based on the video frame of the initial course video and the text corresponding to the video frame, the classification result of the video frame is used to indicate the knowledge point corresponding to the video frame in the extracted knowledge points, and segment the initial course video according to the knowledge point granularity based on the classification result of each video frame in the initial course video to obtain the segmented video. Based on the segmented video, the candidate knowledge point video is obtained.

15. The apparatus according to claim 14, characterized in that, The first acquisition module is used to acquire the evaluation result of the segmented video based on the video features of the segmented video; and to remove videos in the segmented video whose evaluation results do not meet the second evaluation condition to obtain the candidate knowledge point videos.

16. The apparatus according to claim 15, characterized in that, The video features of the segmented video include at least one of the following: the features of the teacher who recorded the initial course video corresponding to the segmented video, the features of the institution to which the teacher who recorded the initial course video corresponding to the segmented video belongs, and the features of the content of the segmented video.

17. The apparatus according to any one of claims 10-13, characterized in that, The device further includes: The editing module is used to edit the original playback speed of each target knowledge point video in the target course video based on the attribute characteristics of the target student, so as to obtain the target playback speed of each target knowledge point video in the target course video. The playback module is used to respond to the learning instruction of the target student for the target course video and play the target knowledge point videos in the target course video according to the target playback speed of each target knowledge point video.

18. The apparatus according to any one of claims 10-13, characterized in that, The device further includes: An update module is used to obtain feedback information from the target student regarding the target course video, the feedback information being used to indicate the effectiveness of the target course video; and based on the feedback information, to update the course video to be studied by the target student.

19. A computer device, characterized in that, The computer device includes a processor and a memory, the memory storing at least one computer program, the at least one computer program being loaded and executed by the processor to enable the computer device to implement the method for determining course videos as described in any one of claims 1 to 9.

20. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one computer program, which is loaded and executed by a processor to enable the computer to implement the method for determining course videos as described in any one of claims 1 to 9.

21. A computer program product, characterized in that, The computer program product includes a computer program or computer instructions, which are loaded and executed by a processor to enable a computer to implement the method for determining course videos as described in any one of claims 1 to 9.