Method and device for automatic evaluation of manufacturing video content quality

By using speech recognition and pre-trained language model question-answering model training on manufacturing videos to generate test questions, the problems of automation and accuracy in assessing the quality of manufacturing video content are solved, achieving efficient assessment without human intervention.

CN117857778BActive Publication Date: 2026-06-30FUDAN UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FUDAN UNIVERSITY
Filing Date
2023-12-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies lack effective automated methods to assess the content quality of manufacturing videos. Traditional methods mainly target visual quality and lack assessment of video content quality. Furthermore, manual assessment is inefficient and lacks objectivity.

Method used

Text is generated from manufacturing videos using speech recognition. A question-answering model is trained using a pre-trained language model. Test questions are generated to evaluate the quality of the video content. The model is then fine-tuned using obfuscated and training questions to ensure the accuracy and fairness of the evaluation.

Benefits of technology

It achieves efficient and accurate video content quality assessment without human intervention, and can simulate the audience's learning process to evaluate the overall quality and specific knowledge points of the video content.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117857778B_ABST
    Figure CN117857778B_ABST
Patent Text Reader

Abstract

This invention provides an automatic evaluation method for the content quality of manufacturing videos. The method comprises the following steps: Step S1, performing speech recognition on the manufacturing video to obtain speech-recognized text; Step S2, further pre-training a pre-trained language model using the speech-recognized text to obtain a secondary pre-trained language model; Step S3, generating training questions based on the speech-recognized text and using these training questions to train the secondary pre-trained language model to obtain a trained question-answering model; Step S4, inputting predetermined test questions into the question-answering model to obtain predicted probabilities, which serve as the evaluation result for the content quality of the manufacturing video. This invention outputs one or more manufacturing videos and related reference documents or manually pre-set test questions to the question-answering model. By calculating the accuracy of the test questions and the question-answering model's answers, the quality of the video content or similar video content is compared.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of quality assessment, specifically relating to an automatic assessment method and apparatus for video content quality in the manufacturing industry. Background Technology

[0002] In manufacturing, various information, processes, technologies, or products related to production are typically showcased in the form of digital videos. These videos can systematically introduce each technological step in the manufacturing process, from raw material handling to final product assembly, enabling workers and technicians to gain a deep understanding of the production flow; they can also be used to teach operating procedures, safe operating procedures, and equipment usage methods, helping employees quickly master the necessary skills; they can also be used to showcase the characteristics, functions, and advantages of products to attract customers or partners, enhance brand image, and boost product sales; or they can serve as educational videos, introducing the history, development, and future of related industries to the public.

[0003] Manufacturing videos play a vital role in improving production efficiency, ensuring product quality, and enhancing employee training effectiveness, making the evaluation of their content quality increasingly crucial. High-quality video content helps relevant personnel quickly acquire knowledge and become familiar with production processes, while watching low-quality videos wastes users' time and may even provide incorrect information. Therefore, efficiently assessing the content quality of manufacturing videos has become a very important issue.

[0004] Many websites currently employ manual review, but this method is labor-intensive, lacks objective evaluation metrics, and is inefficient. Existing automated video quality assessment methods mostly target visual quality aspects such as video distortion and image clarity. Traditional video quality assessment typically involves perceiving, measuring, and evaluating changes and distortions in video image information through subjective and objective methods. Its evaluation metrics primarily focus on whether distortion or noise occurs after compression and uploading, or image-related quality standards such as shakiness or blurring caused by improper operation during video recording. Subjective assessment requires a group of subjects to manually evaluate video quality, watching a series of test videos in a controlled environment and providing an average subjective score as the final evaluation result. Objective assessment, on the other hand, uses specific evaluation models to automatically calculate video quality. Based on the need for lossless reference videos during evaluation, manufacturing video quality assessment can be categorized as no-reference video quality assessment, where no original reference video is available. Early no-reference video quality assessments were typically based on natural scene statistics, using changes in natural image statistical patterns to determine the degree of image distortion. In recent years, more deep learning-based methods have emerged, such as V-MEON and VIDEVAL. DNN-based models can efficiently capture high-level features and complex multiple distortions.

[0005] Currently, there is little research on the assessment of video content quality, and automated assessment methods for video content quality still need further study.

[0006] Traditional video quality assessment primarily focuses on image features such as blur, noise, white balance, and color, and is not suitable for video content quality assessment. Due to the diversity of video content, compared to visual quality, video content quality assessment still relies mainly on manual evaluation or specific assessment dimensions designed for specific domains, lacking effective automated assessment methods. Summary of the Invention

[0007] In education, the dimensions of curriculum content assessment can include timeliness, relevance of content to student needs, usefulness to student learning, and accuracy. Additionally, some studies utilize formative assessment to evaluate and improve the quality of teaching materials. Formative assessment views student performance as feedback, identifying weaknesses in the learning process and helping teachers adjust their teaching accordingly, or to identify and improve deficiencies in teaching materials. Specifically, this approach argues that poor student performance may reflect deficiencies in the teaching materials; therefore, analyzing student test responses can, in turn, assess the quality of the teaching materials and facilitate improvement.

[0008] This idea can also be applied to the quality assessment of video content in the manufacturing industry. Manufacturing videos typically contain a large amount of field-related knowledge. By setting reasonable test questions and analyzing the audience's responses, the quality of the original video's content can be evaluated. For example, a high-quality video introducing the "Overview of the Lithium Battery Manufacturing Industry" should include information on the industry, key milestones in its domestic and international development history, current status, and potential problems. By setting questions related to this field and analyzing the responses, one can roughly understand which knowledge points in the video are clearly presented and which need improvement.

[0009] Therefore, the purpose of this invention is to propose an automated evaluation method for video content quality in the manufacturing industry that requires no manual intervention, analyzes video responses to tests, assesses video content quality, and ensures the accuracy and fairness of the evaluation results.

[0010] Specifically, in order to achieve the above objectives, the present invention adopts the following technical solution:

[0011] An automated method for evaluating the content quality of videos in the manufacturing industry is provided, comprising the following steps:

[0012] Step S1: Perform speech recognition on the manufacturing video to obtain the speech-recognized text;

[0013] Step S2: Use the pre-trained language model of the speech recognition text to perform further pre-training to obtain a secondary pre-trained language model;

[0014] Step S3: Generate a training question based on the speech recognition text, and use the training question to train the secondary pre-trained language model to obtain a trained question-answering model;

[0015] Step S4: Input the predetermined test question into the question-answering model to obtain the predicted answer, which serves as the evaluation result of the content quality of the manufacturing video.

[0016] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features; wherein, step S3 includes the following sub-steps:

[0017] Step S3-1: Train the secondary pre-trained speech model using a predetermined obfuscation problem to obtain the question-answering model before fine-tuning;

[0018] Step S3-2: Generate the training question based on the speech recognition text;

[0019] Step S3-3: Train the question-answering model before fine-tuning using the training question to obtain the question-answering model after fine-tuning.

[0020] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features: wherein the obfuscation question is a single-choice question.

[0021] In step S3-1, the method for generating the obfuscation problem is as follows:

[0022] For each of the test questions, each option in the test question is treated as the correct option once, thereby generating multiple obfuscated questions. The obfuscated questions are used to set the language models pre-trained from different manufacturing videos to the same initial state.

[0023] The pre-trained language model is trained using the confusion problem so that the pre-trained language model is consistent with the predicted probability of each option for the corresponding test question, thereby obtaining the question-answering model before fine-tuning.

[0024] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features: wherein the training problem is a single-choice question.

[0025] In step S3-2, the method for generating the training single-choice questions includes the following sub-steps:

[0026] Step S3-2-1: Extract keywords from each sentence in the language recognition text to obtain the extracted sentences;

[0027] Step S3-2-2: Based on the extracted keywords, generate their synonyms, and randomly sample from the keywords extracted from other sentences to obtain random keywords;

[0028] Step S3-2-3: Use the sentence after word extraction as the question stem, the corresponding extracted keywords as the correct options, and the corresponding synonyms and random keywords as interference options to generate the training question.

[0029] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features: The keyword extraction model used to extract the keywords is SIFRank.

[0030] The tool used to generate the aforementioned synonyms is Synonyms;

[0031] In step S2, the speech recognition text is used to further pre-train the language model based on the large-scale pre-trained Chinese language model bert-base-chinese, resulting in a secondary pre-trained language model.

[0032] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features: in the generation method of the training problem, after generating the synonyms, it is determined whether the number of synonyms is greater than or equal to three; if the determination is no, the random sampling is performed.

[0033] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features: wherein the test questions are single-choice questions, and their generation includes the following sub-steps:

[0034] High-quality reference documents can be used to generate test multiple-choice questions, or manually preset questions can be used directly as test multiple-choice questions;

[0035] After generation, the test questions are categorized according to the knowledge involved, which facilitates the subsequent evaluation of the explanation of relevant knowledge in the video.

[0036] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features; wherein, in step S4, the question-answering model answers the test question multiple times, and the predicted probability is obtained through the following formula:

[0037] For each multiple-choice question q with m options, the model prediction result obtained in the i-th iteration is z. i ={z i,1 ,…,z i,j ,…,z i,m}, i∈[n],

[0038] The average of the predicted probabilities of each option obtained from n trials is taken as the final result of the multiple-choice question, and the option with the highest predicted probability is taken as the predicted answer A.

[0039]

[0040] The automatic evaluation method for video content quality in the manufacturing industry provided by this invention may also have the following technical features; it further includes an automatic evaluation device for video content quality in the manufacturing industry, used to evaluate the content quality of manufacturing videos, including:

[0041] The speech recognition text generation unit is used to perform speech recognition on the manufacturing video to obtain speech recognition text.

[0042] The model secondary pre-training unit is used to further pre-train the pre-trained language model of the speech recognition text to obtain a secondary pre-trained language model.

[0043] The question-answering model generation unit is used to generate training questions based on the speech recognition text, and to train the secondary pre-trained language model using the training questions to obtain a trained question-answering model; and

[0044] The testing and evaluation department is used to input predetermined test questions into the question-answering model to obtain predicted probabilities, which serve as the evaluation results for the quality of video content in the manufacturing industry.

[0045] In recent years, pre-trained language models have made continuous progress. The outstanding performance of models such as BERT and GPT on various natural language processing tasks has elevated research in the field of NLP to a new stage. Large-scale pre-trained language models can not only serve as knowledge bases, storing world knowledge from pre-training corpora in the form of parameters, but also be fine-tuned for application to various knowledge-intensive tasks, such as fact-checking, dialogue, and open-domain question answering. Manufacturing videos also contain a wealth of knowledge, and pre-trained language models, with their powerful semantic understanding capabilities, can effectively learn and represent the knowledge within these videos.

[0046] Compared with the prior art, the beneficial effects achieved by the present invention include:

[0047] This invention utilizes speech recognition text from manufacturing videos to pre-train a language model multiple times. A question-answering model is then trained using training questions on this pre-trained model. Test questions are input into the question-answering model to obtain predicted probabilities, thereby evaluating the content quality of the manufacturing videos. This invention requires no human intervention and ensures the accuracy of the evaluation results.

[0048] In summary, this invention pre-trains a language model using speech recognition text generated from manufacturing videos, then fine-tunes this model on a multiple-choice (QA) question-and-answer (QA) task, evaluating the video's content quality based on the accuracy of its QA responses. This process, to some extent, simulates how viewers receive video content. In this invention, a deep model learns knowledge from the video, and the model's performance in answering questions reflects the quality of the video content itself. If the model performs well on questions related to a specific knowledge point, the content in the video corresponding to that knowledge can be considered high-quality. Therefore, in the evaluation process, in addition to the video itself, it is necessary to use high-quality reference documents to generate test questions, or directly use manually set questions, to test the model. Attached Figure Description

[0049] Figure 1 This is a framework diagram of an automatic evaluation method for video content quality in the manufacturing industry, as described in this invention.

[0050] Figure 2 This is a flowchart of an automatic evaluation method for video content quality in the manufacturing industry, as described in an embodiment of the present invention.

[0051] Figure 3 This is a schematic diagram illustrating the obfuscation problem in the automatic evaluation method for video content quality in the manufacturing industry, as described in this embodiment of the invention.

[0052] Figure 4 This is a schematic diagram of a single-choice question in the training problem of the automatic evaluation method for video content quality in the manufacturing industry, as described in this embodiment of the invention. Detailed Implementation

[0053] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.

[0054] Example

[0055] This embodiment supplyAn automated evaluation method for the quality of video content in the manufacturing industry is proposed. The input to this method is one or more manufacturing videos and high-quality reference documents or manually preset test questions for these videos. For example, the video content is "3D Printing: A New Era of Manufacturing Revolution", and the reference documents can be textual introductions related to 3D printing. The output of this method is the accuracy rate of answering these test questions, including the overall accuracy rate and the accuracy rate of specific knowledge points. These accuracy rates can be used to compare the relative quality of several videos with similar content, or to identify knowledge points in the videos that are not explained adequately.

[0056] Figure 1 This is a framework diagram of the automatic evaluation method for video content quality in the manufacturing industry, as described in this embodiment. Figure 2 This is a flowchart of an automatic evaluation method for video content quality in the manufacturing industry, as described in this embodiment.

[0057] like Figure 1 and Figure 2 As shown, the method specifically includes the following steps:

[0058] Step S1: Perform speech recognition on the manufacturing video to obtain the speech-recognized text.

[0059] Step S2: Use speech recognition text to pre-train the predetermined training model to obtain a secondary pre-trained language model.

[0060] In this embodiment, the pre-trained language model is a large-scale pre-trained Chinese language model, bert-base-chinese. Step S2 enables the pre-trained language model to learn relevant knowledge from manufacturing videos, thereby improving the model's language comprehension ability.

[0061] Step S3: Generate training questions based on speech recognition text, and use the training questions to train the secondary pre-trained language model to obtain a trained question-answering model.

[0062] Step S3 specifically includes the following sub-steps:

[0063] Step S3-1: Generate obfuscation issues based on test issues.

[0064] In this embodiment, the test question, the confusion question, and the training question are all single-choice questions with four options. For ease of description, they will be referred to as test single-choice questions, confusion single-choice questions, and training single-choice questions, respectively.

[0065] The method for generating confusing multiple-choice questions is as follows: for each test multiple-choice question, each option in the test multiple-choice question is treated as the correct option once, thereby generating multiple confusing questions.

[0066] Figure 3 This is an example diagram illustrating the obfuscation problem in this embodiment.

[0067] like Figure 3 As shown, for the same test multiple-choice question, its stem and all options are kept unchanged, so that each option is treated as the correct answer once (i.e., the value of "label" in the figure), thus generating four confusing multiple-choice questions.

[0068] Step S3-2: Use the confusion problem to train the secondary pre-trained speech model to obtain the question-answering model before fine-tuning;

[0069] The pre-trained language models for speech recognition text from different videos may have different initial states because their model parameters are randomly initialized. Before fine-tuning, the pre-trained language models may predict the probability of certain options in the test question higher than that of other options.

[0070] To ensure that the pre-trained language models are in a similar initial state during fine-tuning, eliminate the disturbances caused by random initialization of language model parameters, and guarantee the fairness of subsequent probability calculation, this embodiment uses the aforementioned confusion problem to train the secondary pre-trained speech model. That is, the pre-trained language model is told that each option in the test multiple-choice questions is correct, so that the prediction probability of the pre-trained language model for each option is close to 0.25, thereby setting the models trained from different videos to a similar initial state.

[0071] Step S3-3: Training problem based on speech recognition text generation.

[0072] The training multiple-choice questions are generated as follows: keywords are extracted from each sentence in the language recognition text to obtain the extracted sentences; based on the extracted keywords, their synonyms are generated, and random keywords are randomly sampled from the keywords extracted from other sentences to obtain random keywords; the extracted sentences are used as the question stems, the corresponding extracted keywords are used as the correct options, and the corresponding synonyms and random keywords are used as distractors to generate the training questions.

[0073] In the training question generation method, after generating synonyms, it is determined whether the number of synonyms is greater than or equal to three. If the determination is not correct, random sampling is performed.

[0074] Figure 4 This is an example diagram of the training problem in this embodiment.

[0075] like Figure 4As shown, the original sentence is "Automated production line consists of a workpiece conveying system and a control system" from the language recognition text; the question stem is the sentence after extracting keywords, which is "Automated production line consists of () and a control system"; the extracted keyword "workpiece conveying system" is used as the correct option, which is option 1 in this example; the synonyms corresponding to the extracted keywords, or random keywords are obtained by randomly sampling from keywords extracted from other sentences, and the corresponding synonyms and random keywords are used as interference options, which are options 2-4 in this example, thus generating training multiple-choice questions.

[0076] Furthermore, in this embodiment, the keyword extraction model used to extract keywords is SIFRank, and the tool used to generate synonyms is Synonyms.

[0077] Step S3-4: Train the question-answering model before fine-tuning using the training questions to obtain the question-answering model after fine-tuning.

[0078] Through step S3, the model learns the knowledge from the manufacturing videos.

[0079] Step S4: Input multiple predetermined test questions into the question-answering model to obtain predicted answers, and calculate the accuracy of the question-answering model in answering multiple test questions based on the multiple predicted answers, as the evaluation result of the content quality of the manufacturing video.

[0080] Test multiple-choice questions are generated from high-quality reference documents (the generation method is the same as that for generating training multiple-choice questions, so it will not be repeated), or manually preset questions are used directly as test multiple-choice questions; after generation, the test questions are categorized according to the subdivided knowledge points involved in the questions, which facilitates the subsequent evaluation of the explanation of relevant knowledge in the videos.

[0081] The question-answering model, after multiple attempts at a test question, averages the predicted probabilities of each option obtained from n trials as the final result for the multiple-choice question, and selects the option with the highest predicted probability as the predicted answer A.

[0082] For each multiple-choice question q with m options, the model prediction result obtained in the i-th iteration is z. i ={z i,1 ,…,z i,j ,…,z i,m ], i∈[n],

[0083]

[0084] After the question-answering model has answered all the test questions, we can determine whether its predicted answers for each question are correct and calculate the overall accuracy of the model across all test questions, serving as an overall evaluation of the video content quality. Furthermore, since the test questions were categorized in the above steps, we can also calculate the accuracy of each sub-knowledge point based on the categorization results, serving as an evaluation of the content quality related to each sub-knowledge point. We can also use the accuracy of sub-knowledge points to identify which knowledge points in the video are not explained adequately.

[0085] This embodiment also provides an automatic evaluation device for the content quality of manufacturing videos, corresponding to the above method, for evaluating the content quality of manufacturing videos, including:

[0086] The speech recognition text generation unit is used to perform speech recognition on manufacturing videos to obtain speech recognition text.

[0087] The model's secondary pre-training unit is used to further pre-train the predetermined pre-trained language model using speech recognition text to obtain a secondary pre-trained language model.

[0088] The question-answering model training unit is used to generate training questions based on speech recognition text, and to train a pre-trained language model using these training questions to obtain a trained question-answering model; and

[0089] The testing and evaluation department is used to input predetermined test questions into the question-answering model, obtain predicted answers, and calculate the accuracy of the question-answering model in answering multiple test questions based on multiple predicted answers, which serves as the evaluation result of the quality of video content in the manufacturing industry.

[0090] The role and effect of the embodiments

[0091] The automatic evaluation method for the quality of video content in the manufacturing industry provided in this embodiment involves multiple pre-training of a language model based on the speech recognition text of the manufacturing video. A question-answering model is then trained using training questions. Test questions are input into the question-answering model to obtain predicted answers, and the accuracy rate is further calculated to evaluate the content quality of the manufacturing video. In other words, the model simulates the process of a person watching a video and learning, and then simulates the process of a person referring to a test after learning. The results of this test are used to indirectly evaluate the effectiveness of the video in conveying knowledge, exhibiting excellent interpretability. This method can automatically and effectively evaluate the content quality of videos without human intervention, while also ensuring the accuracy of the evaluation results.

[0092] In this embodiment, by using a confusion problem generated based on the test question to train the secondary pre-trained speech model, the state of the pre-trained language model of different manufacturing videos can be initialized, eliminating the disturbance caused by the random initialization of the pre-trained language model parameters and ensuring the fairness of the subsequent prediction probability calculation process.

[0093] In this embodiment, the test questions are also categorized according to the knowledge involved. In this way, after the question-answering model completes the answers to all the test questions, it can not only calculate the overall accuracy of the answers, but also further calculate the accuracy of each sub-knowledge point based on the categorization results. Therefore, it can not only provide an overall evaluation result of the video content, but also effectively evaluate the content quality related to each sub-knowledge point, and easily determine which sub-knowledge points are not explained well.

[0094] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

[0095] For example, in the above embodiments, the test questions, training questions, etc. are all four-option single-choice questions. In alternatives, the test questions, training questions, etc. can also be single-choice questions with more options, or multiple-choice questions, etc.

[0096] In the above embodiments, the Chinese language model was used as an example for specific explanation. In fact, the method can also be applied to other language models in the same way.

Claims

1. An automatic evaluation method for video content quality in the manufacturing industry, used to evaluate the content quality of manufacturing videos, characterized in that, The steps are as follows: Step S1: Perform speech recognition on the manufacturing video to obtain the speech-recognized text; Step S2: The pre-trained language model is further pre-trained using the speech recognition text to obtain a secondary pre-trained language model. Through step S2, the pre-trained language model can learn relevant knowledge from manufacturing videos, thereby improving the model's language comprehension ability. Step S3: Generate a training question based on the speech recognition text, and use the training question to train the secondary pre-trained language model to obtain a trained question-answering model; Step S3 includes the following sub-steps: Step S3-1: Generate obfuscation questions based on multiple pre-defined test questions; The confusion question mentioned above is a multiple-choice question. In step S3-1, the method for generating the obfuscation problem is as follows: For each of the test questions, each option in the test question is treated as the correct option once, thereby generating multiple obfuscated questions; Step S3-2: Use the obfuscation problem to train the secondary pre-trained language model to obtain the question-answering model before fine-tuning; In step S3-2, the secondary pre-trained language model is trained using the confusion question, so that the secondary pre-trained language model is consistent with the predicted probability of each option of the corresponding test question, thereby obtaining the question answering model before fine-tuning. Step S3-3: Generate the training question based on the speech recognition text; The training questions are multiple-choice questions. In step S3-3, the generation of the training problem includes the following sub-steps: Extract keywords from each sentence in the language recognition text to obtain the extracted sentences; Based on the extracted keywords, their synonyms are generated, and random keywords are obtained by randomly sampling from the keywords extracted from other sentences. The sentence after word extraction is used as the question stem, the corresponding extracted keywords are used as the correct options, and the corresponding synonyms and random keywords are used as interference options to generate the training question. Step S3-4: Train the question-answering model before fine-tuning using the training question to obtain the question-answering model after fine-tuning; Step S4: Input multiple predetermined test questions into the question-answering model to obtain predicted answers, and calculate the accuracy rate of the question-answering model for answering multiple test questions based on the multiple predicted answers, as the evaluation result of the content quality of the manufacturing video.

2. The automatic evaluation method for video content quality in the manufacturing industry according to claim 1, characterized in that: in, The keyword extraction model used to extract the keywords is SIFRank. The tool used to generate the aforementioned synonyms is Synonyms; In step S2, the pre-trained language model is the large-scale pre-trained Chinese language model bert-base-chinese.

3. The automatic evaluation method for video content quality in the manufacturing industry according to claim 1, characterized in that: in, The training problem contains N options. In the method of generating the training problem, after generating the synonyms, it is determined whether the number of synonyms is greater than or equal to N-1. If the determination is not correct, the random sampling is performed.

4. The automatic evaluation method for video content quality in the manufacturing industry as described in claim 1, Its features are: The test questions are single-choice questions, and their generation includes the following sub-steps: High-quality reference documents can be used to generate test multiple-choice questions, or manually preset questions can be used directly as test multiple-choice questions; After generation, the test questions are categorized according to the knowledge involved, which facilitates the subsequent evaluation of the explanation of relevant knowledge in the video.

5. The automatic evaluation method for video content quality in the manufacturing industry according to claim 1, characterized in that: in, In step S4, the question-answering model answers the test question multiple times, and the predicted answer is obtained using the following formula: For each multiple-choice question q with m options, the model prediction result obtained in the i-th iteration is: , The average of the predicted probabilities of each option obtained from n trials is taken as the final result of the multiple-choice question, and the option with the highest predicted probability is taken as the predicted answer. ,Right now 。 6. An automatic evaluation device for video content quality in the manufacturing industry, used to evaluate the content quality of manufacturing videos, characterized in that, include: The speech recognition text generation unit is used to perform speech recognition on the manufacturing video to obtain speech recognition text. The model secondary pre-training unit is used to further pre-train the pre-trained language model based on the speech recognition text to obtain a secondary pre-trained language model. Through the model secondary pre-training unit, the pre-trained language model can learn relevant knowledge from manufacturing videos, thereby improving the model's language comprehension ability. The question-answering model training unit is used to generate training questions based on the speech recognition text, and use the training questions to train the secondary pre-trained language model to obtain a trained question-answering model. The question-answering model training unit includes the following sub-units: The obfuscation issue generation unit generates obfuscation issues based on multiple predefined test issues; The confusion question mentioned above is a multiple-choice question. In the obfuscation problem generation unit, the obfuscation problem is generated in the following way: For each of the test questions, each option in the test question is treated as the correct option once, thereby generating multiple obfuscated questions; The question-answering model generation unit uses the obfuscated question to train the secondary pre-trained language model to obtain the question-answering model before fine-tuning; In the pre-fine-tuning question-answering model generation unit, the obfuscated question is used to train the secondary pre-trained language model so that the secondary pre-trained language model is consistent with the predicted probability of each option of the corresponding test question, thereby obtaining the pre-fine-tuning question-answering model. The training question generation unit generates the training question based on the speech recognition text; The training questions are single-choice questions. The training question generation unit includes the following sub-steps in its method of generating the training question: Extract keywords from each sentence in the language recognition text to obtain the extracted sentences; Based on the extracted keywords, their synonyms are generated, and random keywords are obtained by randomly sampling from the keywords extracted from other sentences. The sentence after word extraction is used as the question stem, the corresponding extracted keywords are used as the correct options, and the corresponding synonyms and random keywords are used as interference options to generate the training question. The fine-tuned question-answering model generation unit uses the training question to train the un-fine-tuned question-answering model to obtain the fine-tuned question-answering model; And a testing and evaluation department, used to input multiple predetermined test questions into the question-answering model, obtain predicted answers, and calculate the accuracy of the question-answering model in answering multiple test questions based on the multiple predicted answers, as an evaluation result of the quality of video content in the manufacturing industry.