An intelligent interview method based on multi-modal AI

By leveraging multimodal AI technology, we have addressed the issues of low efficiency, insufficient accuracy, and strong subjectivity in the existing recruitment and interview process. This has enabled automation and precision in all aspects of the recruitment process, resulting in an efficient, accurate, and objective intelligent recruitment solution.

CN122243434APending Publication Date: 2026-06-19NANJING MAITEWANG SCI & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING MAITEWANG SCI & TECH CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing recruitment and interview process is inefficient, lacks accuracy, is highly subjective, and has poor overall coordination. It also lacks a systematic job description and person profile mapping, has low resume screening efficiency, poorly targeted interview questions, single interview evaluation dimensions, insufficient verification of resume authenticity, and the inability to iterate on corporate preferences.

Method used

We employ a multimodal AI-based intelligent interviewing method, utilizing deep learning, natural language processing, computer vision, and knowledge graph technologies to achieve job description parsing, profile generation, intelligent resume matching, customized interview question generation, multimodal interview assessment, and recruitment preference iteration, thereby improving the automation level and accuracy of each stage of the recruitment process.

Benefits of technology

It improves recruitment efficiency and accuracy, reduces subjective bias caused by human intervention, lowers recruitment costs, forms a closed-loop optimization system, and provides efficient, accurate, and objective intelligent recruitment solutions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243434A_ABST
    Figure CN122243434A_ABST
Patent Text Reader

Abstract

This invention discloses an intelligent interview method based on multimodal AI, comprising the following steps: parsing job descriptions in recruitment information and generating a target person profile; parsing resumes and generating candidate profiles; matching candidate profiles with target person profiles to obtain a candidate screening list; generating interview questions based on the target person profiles; sending interview invitations to candidates in the candidate screening list, and having candidates accept the invitations and conduct AI video interviews according to the interview questions; scoring the candidates' answers; analyzing the candidates' nonverbal information during the interview and providing a nonverbal information evaluation score; determining the authenticity of the resume and obtaining a resume authenticity verification score; calculating the candidate's overall score and outputting the candidate's overall score and scores for each dimension. This invention achieves an intelligent upgrade of the entire recruitment and interview process, providing enterprises with an efficient, accurate, and objective intelligent recruitment solution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence and recruitment interview technology, specifically to an intelligent interview method based on multimodal AI. Background Technology

[0002] Existing recruitment and interview processes often suffer from inefficiency, lack of accuracy, and strong subjectivity in practice, exhibiting the following shortcomings:

[0003] Job description (JD) parsing and profile generation are disconnected: There is a lack of a systematic job description keyword extraction and profile mapping mechanism. Manually building target profiles is time-consuming and labor-intensive, and the standards are not uniform. Existing keyword extraction algorithms have insufficient accuracy in recognizing specific terms in the recruitment field, and the weight allocation lacks dynamic adaptability and cannot adjust keyword weights according to the structure and domain characteristics of the job description text. Low resume screening efficiency: Relying on manual screening of a large number of resumes is easily affected by subjective factors, making it difficult to quickly and accurately match candidates who meet the job requirements. Existing matching algorithms mostly use simple keyword matching or single vector similarity calculation, ignoring important factors such as keyword type, semantic association depth, and keyword scarcity. Screening criteria cannot be effectively reused. Interview questions lack relevance: the question bank is not closely related to job requirements, lacks the ability to generate customized questions and levels based on candidate profiles, the existing question generation algorithm is not accurate enough in calculating the semantic relevance between keywords and questions, and the difficulty level of questions lacks a scientific mapping mechanism, making it difficult to accurately assess the candidate's core abilities. The interview evaluation has a single dimension: Traditional interviews mainly rely on the interviewer's subjective judgment. There is a lack of quantitative standards for evaluating the accuracy of the candidate's answers, logical coherence, and non-verbal information such as tone and micro-expressions. Existing evaluation algorithms have insufficient ability to integrate multimodal data, and the evaluation models for logical coherence, micro-expression types, etc. have low accuracy, resulting in poor objectivity of evaluation results. Insufficient verification of resume authenticity: There is a lack of in-depth questioning mechanisms for resume content, making it difficult to effectively judge the authenticity of resume information and the depth of candidates' relevant experience. Existing verification algorithms mostly use simple text comparison, which has limited ability to evaluate semantic consistency and professional depth, and cannot generate highly targeted verification questions. Enterprise preferences cannot be iterated: The experience of human decision-making in the recruitment process is difficult to accumulate, the system cannot learn the enterprise's recruitment focus, the existing preference learning algorithm has a slow convergence speed, low efficiency in utilizing feedback data, and cannot quickly convert human decision preferences into model parameter adjustments; Insufficient end-to-end collaboration: Each stage, from job description parsing, resume screening, interview organization to evaluation and recruitment, operates independently, resulting in poor data flow, a lack of unified process control and data management mechanisms, insufficient collaborative scheduling capabilities of existing system algorithm modules, impacting overall operational efficiency, and a lack of distributed architecture to ensure the reliability of data storage. Summary of the Invention

[0004] This invention addresses the problems of inefficiency, insufficient accuracy, strong subjectivity, and poor overall coordination in existing recruitment and interview processes. It provides an intelligent interview method based on multimodal AI, integrating deep learning, natural language processing, computer vision, knowledge graphs, and other AI technologies to achieve fully automated processes including job description parsing, candidate profiling, intelligent resume matching, customized interview question generation, multimodal interview assessment, resume authenticity verification, and recruitment preference iteration. The core objective is to improve the automation and accuracy of each stage of the recruitment process, reduce subjective bias caused by human intervention, lower recruitment costs, and form a closed-loop optimization system of "needs analysis - candidate matching - interview assessment - preference learning," providing enterprises with an efficient, accurate, and objective intelligent recruitment solution.

[0005] To achieve the above objectives, the present invention adopts the following technical solution: A smart interview method based on multimodal AI includes the following steps: Analyze job descriptions in recruitment information and generate target person profiles; Analyze resumes and generate candidate profiles; Match candidate profiles with target profiles to obtain a candidate screening list; Generate interview questions based on the target person's profile; Send interview invitations to candidates in the candidate screening list. After accepting the invitation, candidates will conduct an AI video interview according to the interview questions. The candidates' answers were scored. Analyze the nonverbal cues from candidates' interviews and assign nonverbal cues assessment scores. Determine the authenticity of the resume and obtain a resume authenticity verification score; Calculate the candidate's overall score and output the candidate's overall score and scores for each dimension.

[0006] To optimize the above technical solution, the specific measures also include: Furthermore, the process of parsing the job description in the recruitment information and generating a target person profile specifically involves: The BERT pre-trained language model is used as the semantic encoder. The BERT pre-trained language model is trained again using recruitment domain corpus. The trained semantic encoder encodes the job description into a semantic vector sequence. The semantic vector sequence is input into a named entity recognition algorithm that combines bidirectional LSTM and CRF to extract the target person keywords in the job description, including skill requirements, work experience, educational background and ability qualities. The target person keyword weight is calculated based on the frequency of occurrence, positional weight, and domain importance coefficient of the target person's keywords in the job description; the formula for calculating the target person keyword weight is as follows:

[0007] In the formula, It is the keyword weight of the target person. Let k be the frequency of occurrence of the k-th keyword. The position weight of the k-th keyword. Let be the domain importance coefficient of the k-th keyword, and n be the total number of extracted keywords; A Prompt template is constructed using extracted target keywords and their corresponding weights. The Prompt template is then input into a GPT-based generation model based on Prompt Tuning to generate a structured summary of the target person's profile and related communication techniques. The Sentence-BERT model is used to convert the target person keyword list and the target person profile speech summary into target person keyword embedding vectors and target person profile embedding vectors, respectively. After L2 normalization, they are stored in the Milvus vector database. In the knowledge graph, a six-level association structure of "industry category - job category - target person profile - target person keywords - target person profile dimensions - core responsibilities" is constructed. The TransE algorithm is used to optimize the entity and relationship embedding of the knowledge graph, establish the mapping relationship between each element, and dynamically calculate the weight of the association edge based on the keyword weight and the domain association strength.

[0008] Furthermore, the process of parsing resumes and generating candidate profiles specifically involves: For image-based resumes, an OCR algorithm integrating the EAST text detection model and the CRNN text recognition model is used to obtain the text in the image-based resume. Key candidate information is then extracted from the text using BiLSTM. The EAST text detection model optimizes the anchor generation strategy, and the CRNN text recognition model introduces an attention mechanism. For text-based resumes, candidate keywords are directly extracted using BiLSTM. Candidate keywords include basic personal information, work experience, project experience, skills, and education. Based on the extracted candidate keywords, generate candidate keyword embedding vectors with the same dimensions as the target person's keyword embedding vectors; A summary of candidate profiles and their corresponding statements, generated using a GPT-based generative model based on Prompt Tuning. The Sentence-BERT model is used to convert the candidate profile dialogue summary into candidate profile embedding vectors; The candidate keyword embedding vector and candidate profile embedding vector are L2 normalized and stored in the Milvus vector database and bound to the candidate's unique identifier to establish an association index between vectors and structured information. A six-level association structure is established in the knowledge graph, consisting of "candidate-personal attributes-candidate profile-candidate keywords-candidate profile dimensions-industry field". A dynamic relationship weight update mechanism is adopted to adjust the weight of the association edges in real time according to the matching of the candidate profile and the target person profile.

[0009] Furthermore, the matching of the candidate profile with the target profile specifically involves: A matching coefficient calculation model based on cosine similarity and semantic overlap is used to assign a matching coefficient to candidate keywords. The calculation formula is as follows:

[0010] In the formula, This is the embedding vector for candidate keywords. The embedding vector for the target person's keywords. This is the set of semantic descriptions corresponding to the candidate keywords. This is a set of semantic descriptions corresponding to the keywords of the target person. It represents the coverage of the candidate keyword semantic description set to the target person's keyword semantic description set, used to measure the degree of semantic overlap. This indicates that the number of elements in the set is counted. The fusion weights for the cosine similarity of the embedding vectors. The fusion weights are for semantic coverage; Based on the keyword weight of the target person Keyword matching coefficient with candidate A hierarchical weighted summation algorithm is used to calculate the keyword-level matching score. The formula is as follows:

[0011] In the formula, n is the number of keywords for the target person. The weight of the keyword for the k-th target person. It is the keyword matching coefficient of the k-th candidate. Keyword type weight; A cosine similarity calculation model is used, incorporating importance weights for vector components, to calculate the vector similarity between the target person's image embedding vector and the candidate person's image embedding vector. :

[0012] In the formula, The d-th dimension component of the embedding vector for the target person's portrait. The d-th dimension component of the candidate's portrait embedding vector. The importance weights of the d-th dimension components; Combining keyword-level matching scores Similarity to vectors An adaptive weighted fusion algorithm is used to calculate the total matching score. :

[0013] In the formula, It is the weight of the keyword matching score. These are the weights for vector similarity; the weights are dynamically adjusted based on job type. and ; Based on the mean of the candidate's total match score and standard deviation Calculate the matching threshold Filter out For candidates, a ranking algorithm that combines matching scores and keyword scarcity is used to calculate the ranking priority. The formula for calculating the ranking priority is as follows: , This is a weighting factor for scarce keywords, ranked by priority. Sort in descending order to generate a candidate screening list that includes basic candidate information, matching score, keyword matching details, and scarcity markers.

[0014] Furthermore, the specific steps for generating interview questions based on the target person's profile are as follows: Keywords for the target person are extracted. Based on the relationship between keywords and profile dimensions in the knowledge graph, candidate questions are retrieved from the question bank. A BERT-based cross-modal semantic relevance calculation model is used to calculate the semantic relevance between each target person keyword and each question text. The formula is as follows:

[0015] In the formula, Let k be the keyword embedding vector for the target person. Let be the embedding vector of the text for the q-th question. It is the keyword weight of the target person. Let be the weight of the evaluation dimension for the q-th question. This represents the semantic relevance between the keywords of the k-th target person and the q-th question; For each question, a hierarchical aggregation algorithm is used to calculate the sum of relevance scores based on keyword type, and then a weighted fusion is performed to obtain the overall relevance score for the question. :

[0016] In the formula, A collection of skill-related keywords. It's the weight of skill-related keywords. It is a collection of experience-related keywords. It's the weight of experience-related keywords. This is a collection of keywords related to abilities and qualities. It refers to the weight of keywords related to abilities and qualities; Set a threshold for the overall correlation score Select questions based on their overall relevance score. Exceeding the threshold The questions are sorted in descending order of score, and the top K questions are selected according to the user-defined number of questions K. At the same time, the difficulty of the questions is mapped based on the keyword difficulty level and the question complexity. The mapping relationship is optimized using a decision tree model. The features of the decision tree model include keyword difficulty level, number of knowledge points in the question, number of reasoning steps, and length of the question text.

[0017] Furthermore, the specific steps for scoring the candidates' answers are as follows: An end-to-end speech recognition model based on Transformer is adopted, and the model is fine-tuned on interview scenario corpus. A noise suppression algorithm based on spectral subtraction and wavelet transform is introduced to suppress noise. The candidate's speech is converted into text in real time. A text cleaning algorithm is used to remove redundant information, and word segmentation, part-of-speech tagging and entity recognition are performed. Finally, it is converted into an embedding vector of answer keywords. The core keywords of the problem are extracted using a BERT-based keyword extraction model. Combine knowledge graphs to expand the synonyms and related words of the core demand keywords; and convert them into embedding vectors of the core demand keywords of the problem; Objective questions use knowledge graphs to connect key elements of pre-set standard answers. For subjective questions, a core evaluation dimension system is constructed based on job requirements, and the weight of each evaluation dimension is determined by the analytic hierarchy process. A bidirectional semantic matching model is used to calculate the positive similarity between the answer keywords and the core appeal keywords, and the negative similarity between the core appeal keywords and the answer keywords, and then weighted and fused to obtain the semantic fit. :

[0018] In the formula, To answer the question about the keyword "embedding vector", The embedding vector represents the core keywords of the problem. It answers the question of the positive similarity between keywords and core appeal keywords. It is the reverse similarity between the core appeal keywords and the answer keywords; A hierarchical evaluation model for logical coherence is constructed based on text syntactic structure analysis, semantic relevance calculation, and paragraph cohesion feature extraction, outputting a logical score. , The hierarchical evaluation model for logical coherence is trained using the random forest algorithm, and its features include syntactic completeness, mean inter-sentence similarity, cohesive word coverage, and topic drift. Calculating the correctness of the answer specifically includes: for objective questions, using an algorithm combining exact matching and fuzzy matching to compare the key elements of the answer with the standard answer. Consistency, calculating the correctness of the answer. The formula is , For an exact match, the value is 0 or 1; This represents the fuzzy matching similarity score, with values ​​ranging from 0 to 1. To accurately match weights, For fuzzy matching weights, for subjective questions, a multi-label classification model is used to evaluate the relevance of the answers based on the core evaluation dimension system, combined with a domain expert rule base to assess professionalism, and the correctness of the answers is calculated. ; Based on semantic fit Logical fractions and the correctness of the answer A dynamic weighted summation formula is used to adjust the weighting coefficients based on whether the questions are objective or subjective, and the content assessment score is calculated accordingly. :

[0019] In the formula, It is the semantic fit weight. It is a logical score weight. It is the weight of the correctness of the answer.

[0020] Furthermore, the analysis of nonverbal information during candidate interviews specifically includes: Real-time acquisition of candidate facial video stream and audio stream; extraction of facial key point coordinates, expression category and eye direction from facial video stream; extraction of speech rate, tone, energy and number of pauses from audio stream to obtain speech feature vector; Input the speech feature vector into the speech emotion recognition model based on the CNN-LSTM hybrid model, output the probability distribution of emotion tendency, and determine whether the emotion tendency is positive, neutral or negative. A statistical analysis model is used to determine the reasonableness of speaking speed. A speaking speed threshold range is set based on job type, and a speaking speed reasonableness score is calculated. , For the candidate's speaking speed, This represents the average speaking speed for the corresponding job position. Calculating intonation stability using fundamental frequency stability , For the base frequency standard deviation The mean of the fundamental frequency; A tone evaluation index system was constructed, and a weighted summation method was used to obtain the tone evaluation score. , The formula is , Different emotional tendency scores were assigned for positive, neutral, and negative emotions. A micro-expression recognition model based on an attention mechanism was constructed. The micro-expression recognition model was pre-trained on the MMI micro-expression dataset. Facial region attention weights were introduced, and different weights were set for the eyes, mouth, eyebrows and other regions. The trained micro-expression recognition model was used to identify micro-expression types, including confidence, tension, sincerity and perfunctoriness. Calculate duration score by combining facial expression duration , Calculate frequency scores for expression duration. , Frequency of facial expression changes; Microexpression assessment scores are calculated using the following formula. :

[0021] In the formula, Assign scores to different types of micro-expressions; An adaptive weight fusion algorithm is used to dynamically adjust the weight coefficients according to the interview scenario. and Calculate the total score for non-verbal information. :

[0022] In the formula, The weighting coefficient for the tone assessment score. The weighting coefficients for the micro-expression assessment score.

[0023] Furthermore, the specific steps for determining the authenticity of a resume and obtaining a resume authenticity verification score are as follows: Based on the candidate's work experience, project experience, and skill descriptions in the resume, and combined with the professional knowledge system in the knowledge graph, targeted verification questions are generated through the GPT-type generation model optimized by Prompt Tuning. The types of targeted verification questions include detailed follow-up questions, logical verification questions, and professional depth questions. The Prompt template of the GPT-type generation model is dynamically adjusted according to the question type; targeted verification questions are then presented to the candidate. A bidirectional semantic matching algorithm is used to extract key information from the answers to verification questions and convert it into an embedding vector. Convert the corresponding description in the resume into an embedding vector. Meanwhile, keyword overlap and semantic structure similarity are introduced as auxiliary features to calculate the consistency score. :

[0024] In the formula, This refers to the set of key entities in the candidate's answers to targeted verification questions. The set of key entities in the resume. Semantic structural similarity; A deep evaluation index system is constructed based on the professional knowledge system in the knowledge graph, including richness of detail, professional depth, and logical rigor. A multilayer perceptron is used to score each index based on the candidate's answers to targeted verification questions, and the scores of each index are combined to give a depth score. A multilayer perceptron consists of an input layer, a hidden layer, and an output layer. The activation function of the hidden layer is ReLU, and the activation function of the output layer is Sigmoid. For consistency scores and depth score The weighted average score for resume authenticity verification is as follows:

[0025] In the formula, It is the score for verifying the authenticity of the resume. These are the weighting coefficients for the consistency score. It is the weighting coefficient of the depth score.

[0026] Furthermore, the calculation of the candidate's overall score and the output of the candidate's overall score and scores for each dimension are specifically as follows: Content assessment scores were determined using the analytic hierarchy process (AHP). Non-verbal information total score Resume authenticity verification score and total match score The weights of these four metrics, including the content evaluation score weight, are as follows: Weighting of non-verbal information in total score Weighting of resume authenticity verification score Weight of total matching score ; The candidates' overall scores are calculated as follows:

[0027] In the formula, It is the candidate's overall score. Indicates normalization; Output the candidate's overall score, scores for each dimension, and scoring details, including keyword matching results, logical analysis conclusions, correctness of answers, non-verbal information analysis results, and resume verification conclusions. At the same time, generate visual charts of the scores for each dimension.

[0028] Furthermore, the method also includes: receiving human feedback, the types of which include: acknowledging AI recommendations, rejecting AI recommendations, supplementing recommendations, and adjusting the weights of scoring dimensions; and recording the candidate profile, evaluation score vector, and text of the human decision-making reasoning corresponding to the feedback. A BERT model based on an attention mechanism is used to extract recruitment preference features from the text of human decision-making reasons, and output a preference feature vector; the weight adjustment coefficients of the preference feature vector are calculated through a contrastive learning algorithm. The formula is as follows:

[0029] In the formula, To recognize the number of times AI recommendations were made, To negate the number of times AI recommendations were made, Total number of feedback responses For the preference feature vector, Create an embedding vector for the target person's portrait. For cosine similarity, Based on weight adjustment coefficient An incremental update algorithm is used to update the weights of the association edges between keywords and profile dimensions in the knowledge graph; The following specific model parameters are adjusted using reinforcement learning: the importance weight of the d-th dimension component of the target person's portrait embedding vector. The weight of keyword matching score Weights of vector similarity Weight of skill-related keywords Weight of experience-related keywords Weight of keywords related to ability and competence Semantic fit weight Logical score weights Weight of correct answer Content evaluation score weighting Weighting of non-verbal information in total score Weighting of resume authenticity verification score Weight of total matching score ; With maximizing the satisfaction level of human feedback as the optimization objective, the satisfaction level of human feedback is used as the reward function for reinforcement learning; the formula for calculating the satisfaction level of human feedback is as follows:

[0030] In the formula, For human feedback on satisfaction, To recognize the number of times AI recommendations were made, The number of times the scoring results are manually adjusted. This represents the total number of feedback responses.

[0031] The beneficial effects of this invention are: 1. Algorithm optimization enhances core capabilities: The improved BERT model and dynamic weight calculation algorithm enhance the accuracy of JD keyword extraction and the rationality of weight allocation. Secondary pre-training in the recruitment field makes the model more suitable for business scenarios; the multi-source fusion resume parsing algorithm enhances the robustness of extracting key information from resumes of different formats, and the combination of optimized OCR and deep text parsing models improves the accuracy and recall of information extraction; the adaptive weighted fusion matching algorithm combines keyword hierarchical matching and improved vector similarity calculation, while taking into account factors such as keyword type and scarcity, which greatly improves the accuracy and efficiency of resume screening. 2. Customization and Personalization Enhancement: Generative AI based on the fusion of deep learning keyword-question correlation calculation model and domain knowledge graph enables precise customization and dynamic supplementation of interview questions. The decision tree model optimizes the difficulty mapping relationship to meet the assessment needs of different positions and candidates. The quality verification model ensures the effectiveness of the generated questions. 3. Multi-dimensional quantitative evaluation: Optimized ASR technology improves the accuracy of speech-to-text conversion, while the attention mechanism micro-expression recognition model and CNN-LSTM speech emotion recognition model enhance the accuracy of non-verbal information evaluation; bidirectional semantic matching algorithm and multilayer perceptron model realize a comprehensive evaluation of resume authenticity and professional depth. The multi-dimensional evaluation system effectively reduces the influence of subjective factors and improves the objectivity of evaluation results. 4. Closed-loop iterative continuous optimization: The reinforcement learning recruitment preference learning model can efficiently extract human decision-making preferences, use human feedback satisfaction as the reward function, realize dynamic updates of model parameters and knowledge graph weights, converge faster, enable the system to continuously adapt to enterprise recruitment needs, and continuously improve the accuracy of matching and evaluation. 5. Highly efficient end-to-end collaboration: Through task orchestration and collaboration modules (based on the Camunda workflow engine), efficient linkage and real-time data flow between various algorithm modules are achieved. The distributed storage architecture ensures the efficiency and reliability of data processing. It supports custom configuration of processes, adapts to the recruitment process needs of different enterprises, significantly reduces recruitment costs, and improves recruitment quality and efficiency. It is suitable for different recruitment scenarios of various enterprises. Attached Figure Description

[0032] Figure 1 This is a schematic diagram of the overall process of an intelligent interview method based on multimodal AI.

[0033] Figure 2 A module structure diagram for generating a profile of the target person.

[0034] Figure 3 This is a structural diagram of the resume matching and screening module.

[0035] Figure 4 This is a structural diagram of the interview assessment module.

[0036] Figure 5 This is an architecture diagram of an intelligent interview system based on multimodal AI. Detailed Implementation

[0037] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0038] Example 1 This invention proposes an intelligent interview method based on multimodal AI. The process of this method is as follows: Figure 1 As shown, it includes the following steps: S1. Parse the job description in the recruitment information and generate a target person profile; specifically: Enterprise users log in to the system and enter the job description for the "Data Analyst" position, which includes "Bachelor's degree or above, more than 3 years of data analysis experience in the Internet industry, proficient in tools such as Python, SQL, and Tableau, with data modeling and logical analysis capabilities, and responsible for business data monitoring, analysis and decision support"; The BERT pre-trained language model is used as the semantic encoder. The BERT pre-trained language model is trained again using recruitment domain corpus. The trained semantic encoder encodes the job description into a semantic vector sequence. The semantic vector sequence is input into a named entity recognition algorithm that combines bidirectional LSTM and CRF to extract the target person keywords in the job description, including skill requirements, work experience, educational background and ability qualities. The target person keyword weight is calculated based on the frequency of occurrence, positional weight, and domain importance coefficient of the target person's keywords in the job description; the formula for calculating the target person keyword weight is as follows:

[0039] In the formula, It is the keyword weight of the target person. Let k be the frequency of occurrence of the k-th keyword. The position weight of the k-th keyword. Let be the domain importance coefficient of the k-th keyword, and n be the total number of extracted keywords; In this embodiment, the target person's keywords and their weights are as follows: Education (Bachelor's degree or above) Work experience (3+ years) Internet industry Data analysis, ), skills (Python, SQL Tableau, ;), abilities and qualities (logical analysis ability, The weights are calculated using a dynamic weight formula, ensuring that the sum of all weights equals 1. A Prompt template is constructed using extracted target keywords and their corresponding weights. The Prompt template is then input into a GPT-based generation model based on Prompt Tuning to generate a structured summary of the target person's profile and related communication techniques. In this embodiment, the Prompt template is as follows: "For the data analyst position, the core requirements include: educational background [Bachelor's degree or above, weight 0.1], work experience [3+ years, weight 0.2; Internet industry, weight 0.1; data analysis, weight 0.1], core skills [Python, weight 0.15; SQL, weight 0.15; Tableau, weight 0.1], and competence [logical analysis ability, weight 0.1], and the ability to perform business data monitoring, analysis, and decision support related work." The Sentence-BERT model is used to convert the target person keyword list and the target person profile speech summary into target person keyword embedding vectors and target person profile embedding vectors, respectively. After L2 normalization, they are stored in the Milvus vector database. In the knowledge graph, a six-level association structure of "industry category - job category - target person profile - target person keywords - target person profile dimensions - core responsibilities" is constructed. The TransE algorithm is used to optimize the entity and relationship embedding of the knowledge graph, establish the mapping relationship between each element, and dynamically calculate the weight of the association edge based on the keyword weight and the domain association strength.

[0040] S2. Parse resumes and generate candidate profiles; specifically: For image-based resumes, an OCR algorithm integrating the EAST text detection model and the CRNN text recognition model is used to obtain the text in the image-based resume. Key candidate information is then extracted from the text using BiLSTM. The EAST text detection model optimizes the anchor generation strategy, and the CRNN text recognition model introduces an attention mechanism. For text-based resumes, candidate keywords are directly extracted using BiLSTM. Candidate keywords include basic personal information, work experience, project experience, skills, and education. In this embodiment, the candidate uploads a PDF resume with the following text: The system extracts information through a multi-source fusion resume parsing algorithm: Bachelor's degree, 2 years of experience in Internet data analysis, proficient in SQL and Tableau, participated in user behavior data analysis projects, and has basic Python programming skills; Based on the extracted candidate keywords, generate candidate keyword embedding vectors with the same dimensions as the target person's keyword embedding vectors; The candidate profile speech summary is generated based on the GPT-type generation model of Prompt Tuning. In this embodiment, the generated candidate profile speech summary is as follows: "The candidate has a bachelor's degree, has 2 years of data analysis experience in the Internet industry, is proficient in using SQL and Tableau tools, has basic Python programming skills, has participated in user behavior data analysis projects, and can complete basic data processing and visualization work." The Sentence-BERT model is used to convert the candidate profile dialogue summary into candidate profile embedding vectors; The candidate keyword embedding vector and candidate profile embedding vector are L2 normalized and stored in the Milvus vector database and bound to the candidate's unique identifier to establish an association index between vectors and structured information. A six-level association structure is established in the knowledge graph, consisting of "candidate-personal attributes-candidate profile-candidate keywords-candidate profile dimensions-industry field". A dynamic relationship weight update mechanism is adopted to adjust the weight of the association edges in real time according to the matching of the candidate profile and the target person profile.

[0041] S3. Match candidate profiles with target profiles to obtain a candidate screening list; specifically: A matching coefficient calculation model based on cosine similarity and semantic overlap is used to assign a matching coefficient to candidate keywords. The calculation formula is as follows:

[0042] In the formula, This is the embedding vector for candidate keywords. The embedding vector for the target person's keywords. This is the set of semantic descriptions corresponding to the candidate keywords. This is a set of semantic descriptions corresponding to the keywords of the target person. It represents the coverage of the candidate keyword semantic description set to the target person's keyword semantic description set, used to measure the degree of semantic overlap. This indicates that the number of elements in the set is counted. The fusion weights for the cosine similarity of the embedding vectors. The fusion weights are for semantic coverage; Based on the keyword weight of the target person Keyword matching coefficient with candidate A hierarchical weighted summation algorithm is used to calculate the keyword-level matching score. The formula is as follows:

[0043] In the formula, n is the number of keywords for the target person. The weight of the keyword for the k-th target person. It is the keyword matching coefficient of the k-th candidate. Keyword type weight; A cosine similarity calculation model is used, incorporating importance weights for vector components, to calculate the vector similarity between the target person's image embedding vector and the candidate person's image embedding vector. :

[0044] In the formula, The d-th dimension component of the embedding vector for the target person's portrait. The d-th dimension component of the candidate's portrait embedding vector. The importance weights of the d-th dimension components; Combining keyword-level matching scores Similarity to vectors An adaptive weighted fusion algorithm is used to calculate the total matching score. :

[0045] In the formula, It is the weight of the keyword matching score. These are the weights for vector similarity; the weights are dynamically adjusted based on job type. and ; Based on the mean of the candidate's total match score and standard deviation Calculate the matching threshold Filter out For candidates, a ranking algorithm that combines matching scores and keyword scarcity is used to calculate the ranking priority. The formula for calculating the ranking priority is as follows: , This is a weighting factor for scarce keywords, ranked by priority. Sort in descending order to generate a candidate screening list that includes basic candidate information, matching score, keyword matching details, and scarcity markers.

[0046] S4. Generate interview questions based on the target person's profile; specifically: Keywords for the target person are extracted. Based on the relationship between keywords and profile dimensions in the knowledge graph, candidate questions are retrieved from the question bank. A BERT-based cross-modal semantic relevance calculation model is used to calculate the semantic relevance between each target person keyword and each question text. The formula is as follows:

[0047] In the formula, Let k be the keyword embedding vector for the target person. Let be the embedding vector of the text for the q-th question. It is the keyword weight of the target person. Let be the weight of the evaluation dimension for the q-th question. This represents the semantic relevance between the keywords of the k-th target person and the q-th question; For each question, a hierarchical aggregation algorithm is used to calculate the sum of relevance scores based on keyword type, and then a weighted fusion is performed to obtain the overall relevance score for the question. :

[0048] In the formula, A collection of skill-related keywords. It's the weight of skill-related keywords. It is a collection of experience-related keywords. It's the weight of experience-related keywords. This is a collection of keywords related to abilities and qualities. It refers to the weight of keywords related to abilities and qualities; Set a threshold for the overall correlation score Select questions based on their overall relevance score. Exceeding the threshold The questions are sorted in descending order of score, and the top K questions are selected according to the user-defined number of questions K. At the same time, the difficulty of the questions is mapped based on the keyword difficulty level and the question complexity. The mapping relationship is optimized using a decision tree model. The features of the decision tree model include keyword difficulty level, number of knowledge points in the question, number of reasoning steps, and length of the question text.

[0049] If the number of matched questions is less than K, a generative AI model based on the domain knowledge graph (a GPT-like model finely tuned with the recruitment domain knowledge graph) is used to dynamically generate supplementary questions based on the target keywords and job requirements. The generated questions are added to the question bank after being verified by a quality verification model (including semantic fluency score, examination relevance score, and difficulty reasonableness score; a total score ≥ 0.85 is considered qualified), and the association between the questions and keywords and the embedding vector are updated.

[0050] S5. Send interview invitations to candidates in the candidate screening list. After accepting the invitation, candidates will conduct AI video interviews according to the interview questions. S6. Score the candidates' answers; specifically: An end-to-end speech recognition model based on Transformer is adopted, and the model is fine-tuned on interview scenario corpus. A noise suppression algorithm based on spectral subtraction and wavelet transform is introduced to suppress noise. The candidate's speech is converted into text in real time. A text cleaning algorithm is used to remove redundant information, and word segmentation, part-of-speech tagging and entity recognition are performed. Finally, it is converted into an embedding vector of answer keywords. The core keywords of the problem are extracted using a BERT-based keyword extraction model. Combine knowledge graphs to expand the synonyms and related words of the core demand keywords; and convert them into embedding vectors of the core demand keywords of the problem; Objective questions use knowledge graphs to connect key elements of pre-set standard answers. For subjective questions, a core evaluation dimension system is constructed based on job requirements, and the weight of each evaluation dimension is determined by the analytic hierarchy process. A bidirectional semantic matching model is used to calculate the positive similarity between the answer keywords and the core appeal keywords, and the negative similarity between the core appeal keywords and the answer keywords, and then weighted and fused to obtain the semantic fit. :

[0051] In the formula, To answer the question about the keyword "embedding vector", The embedding vector represents the core keywords of the problem. It answers the question of the positive similarity between keywords and core appeal keywords. It is the reverse similarity between the core appeal keywords and the answer keywords; A hierarchical evaluation model for logical coherence is constructed based on text syntactic structure analysis, semantic relevance calculation, and paragraph cohesion feature extraction, outputting a logical score. , The hierarchical evaluation model for logical coherence is trained using the random forest algorithm, and its features include syntactic completeness, mean inter-sentence similarity, cohesive word coverage, and topic drift. Calculating the correctness of the answer specifically includes: for objective questions, using an algorithm combining exact matching and fuzzy matching to compare the key elements of the answer with the standard answer. Consistency, calculating the correctness of the answer. The formula is , For an exact match, the value is 0 or 1; This represents the fuzzy matching similarity score, with values ​​ranging from 0 to 1. To accurately match weights, For fuzzy matching weights, for subjective questions, a multi-label classification model is used to evaluate the relevance of the answers based on the core evaluation dimension system, combined with a domain expert rule base to assess professionalism, and the correctness of the answers is calculated. ; Based on semantic fit Logical fractions and the correctness of the answer A dynamic weighted summation formula is used to adjust the weighting coefficients based on whether the questions are objective or subjective, and the content assessment score is calculated accordingly. :

[0052] In the formula, It is the semantic fit weight. It is a logical score weight. It is the weight of the correctness of the answer.

[0053] S7. Analyze the nonverbal cues from the candidate's interview and provide a nonverbal cues assessment score; specifically: Real-time acquisition of candidate facial video stream and audio stream; extraction of facial key point coordinates, expression category and eye direction from facial video stream; extraction of speech rate, tone, energy and number of pauses from audio stream to obtain speech feature vector; Input the speech feature vector into the speech emotion recognition model based on the CNN-LSTM hybrid model, output the probability distribution of emotion tendency, and determine whether the emotion tendency is positive, neutral or negative. A statistical analysis model is used to determine the reasonableness of speaking speed. A speaking speed threshold range is set based on job type, and a speaking speed reasonableness score is calculated. , For the candidate's speaking speed, This represents the average speaking speed for the corresponding job position. Calculating intonation stability using fundamental frequency stability , For the base frequency standard deviation The mean of the fundamental frequency; A tone evaluation index system was constructed, and a weighted summation method was used to obtain the tone evaluation score. , The formula is , Different emotional tendency scores were assigned for positive, neutral, and negative emotions. A micro-expression recognition model based on an attention mechanism was constructed. The micro-expression recognition model was pre-trained on the MMI micro-expression dataset. Facial region attention weights were introduced, and different weights were set for the eyes, mouth, eyebrows and other regions. The trained micro-expression recognition model was used to identify micro-expression types, including confidence, tension, sincerity and perfunctoriness. Calculate duration score by combining facial expression duration , Calculate frequency scores for expression duration. , Frequency of facial expression changes; Microexpression assessment scores are calculated using the following formula. :

[0054] In the formula, Assign scores to different types of micro-expressions; An adaptive weight fusion algorithm is used to dynamically adjust the weight coefficients according to the interview scenario. and Calculate the total score for non-verbal information. :

[0055] In the formula, The weighting coefficient for the tone assessment score. The weighting coefficients for the micro-expression assessment score.

[0056] S8. Determine the authenticity of the resume and obtain a resume authenticity verification score; specifically: Based on the candidate's work experience, project experience, and skill descriptions in the resume, and combined with the professional knowledge system in the knowledge graph, targeted verification questions are generated through the GPT-type generation model optimized by Prompt Tuning. The types of targeted verification questions include detailed follow-up questions, logical verification questions, and professional depth questions. The Prompt template of the GPT-type generation model is dynamically adjusted according to the question type; targeted verification questions are then presented to the candidate. A bidirectional semantic matching algorithm is used to extract key information from the answers to verification questions and convert it into an embedding vector. Convert the corresponding description in the resume into an embedding vector. Meanwhile, keyword overlap and semantic structure similarity are introduced as auxiliary features to calculate the consistency score. :

[0057] In the formula, This refers to the set of key entities in the candidate's answers to targeted verification questions. The set of key entities in the resume. Semantic structural similarity; A deep evaluation index system is constructed based on the professional knowledge system in the knowledge graph, including richness of detail, professional depth, and logical rigor. A multilayer perceptron is used to score each index based on the candidate's answers to targeted verification questions, and the scores of each index are combined to give a depth score. A multilayer perceptron consists of an input layer, a hidden layer, and an output layer. The activation function of the hidden layer is ReLU, and the activation function of the output layer is Sigmoid. For consistency scores and depth score The weighted average score for resume authenticity verification is as follows:

[0058] In the formula, It is the score for verifying the authenticity of the resume. These are the weighting coefficients for the consistency score. It is the weighting coefficient of the depth score.

[0059] S9. Calculate the candidate's overall score and output the candidate's overall score and scores for each dimension. Specifically: Content assessment scores were determined using the analytic hierarchy process (AHP). Non-verbal information total score Resume authenticity verification score and total match score The weights of these four metrics, including the content evaluation score weight, are as follows: Weighting of non-verbal information in total score Weighting of resume authenticity verification score Weight of total matching score ; The candidates' overall scores are calculated as follows:

[0060] In the formula, It is the candidate's overall score. Indicates normalization; Output the candidate's overall score, scores for each dimension, and scoring details, including keyword matching results, logical analysis conclusions, correctness of answers, non-verbal information analysis results, and resume verification conclusions. At the same time, generate visual charts of the scores for each dimension.

[0061] S10. Receive human feedback, including: acknowledging AI recommendations, rejecting AI recommendations, providing supplementary recommendations, and adjusting the weighting of scoring dimensions; record the candidate profile, evaluation score vector, and text of the human decision-making reasoning corresponding to the feedback. A BERT model based on an attention mechanism is used to extract recruitment preference features from the text of human decision-making reasons, and output a preference feature vector; the weight adjustment coefficients of the preference feature vector are calculated through a contrastive learning algorithm. The formula is as follows:

[0062] In the formula, To recognize the number of times AI recommendations were made, To negate the number of times AI recommendations were made, Total number of feedback responses For the preference feature vector, Create an embedding vector for the target person's portrait. For cosine similarity, Based on weight adjustment coefficient An incremental update algorithm is used to update the weights of the association edges between keywords and profile dimensions in the knowledge graph; The following specific model parameters are adjusted using reinforcement learning: the importance weight of the d-th dimension component of the target person's portrait embedding vector. The weight of keyword matching score Weights of vector similarity Weight of skill-related keywords Weight of experience-related keywords Weight of keywords related to ability and competence Semantic fit weight Logical score weights Weight of correct answer Content evaluation score weighting Weighting of non-verbal information in total score Weighting of resume authenticity verification score Weight of total matching score ; With maximizing the satisfaction level of human feedback as the optimization objective, the satisfaction level of human feedback is used as the reward function for reinforcement learning; the formula for calculating the satisfaction level of human feedback is as follows:

[0063] In the formula, For human feedback on satisfaction, To recognize the number of times AI recommendations were made, The number of times the scoring results are manually adjusted. This represents the total number of feedback responses.

[0064] Example 2 This invention proposes an intelligent interview system based on multimodal AI, corresponding to the method in Embodiment 1, comprising: See the job description parsing and profile generation module. Figure 2 It is used to receive job descriptions, supports batch import and single input, and supports TXT and Word formats; it extracts keywords by improving the BERT model and dynamic weight algorithm, calculates weights by dynamic weight formula, and generates profile and speech summary by using a GPT-like model optimized by Prompt Tuning; it completes embedding conversion and data association storage; and it interacts with Milvus vector database and Neo4j knowledge graph to realize data storage and six-level association.

[0065] The resume parsing module is used to parse candidate resumes, extract key information through a multi-source fusion resume parsing algorithm, generate candidate profiles, and complete data storage and association; it supports uploading and parsing resumes in PDF, Word, and image formats, with a maximum file size of 50MB; See the profile matching and filtering module. Figure 3 It is used to calculate the profile matching degree by improving the cosine similarity and adaptive weighted fusion algorithm, perform dynamic threshold filtering and sorting, and output the filtering results; The interview question generation module is used to match related questions based on a deep learning-based correlation calculation model, enabling question filtering, grading, and supplementary generation, and outputting a customized question bank. Video interview interaction module: used to send interview invitations, build video interview scenarios, collect multimodal data during the interview process, and enable interaction between candidates and AI; For the interview assessment module, see Figure 4 ,include: Answer content assessment: ASR converts speech to text and cleans it, extracts core elements, calculates semantic fit, logical score and answer correctness, and outputs content score; Non-verbal information assessment: Collect facial and speech features, calculate tone and micro-expression scores through attention mechanism micro-expression recognition model and CNN-LSTM speech emotion recognition model, and output non-verbal information scores; Resume verification: Generate targeted verification questions, analyze the consistency and depth of the answers, and output a verification score.

[0066] The comprehensive scoring module receives content evaluation, non-verbal information evaluation, resume verification, and matching score; calculates the comprehensive score according to the user-configured weights, and generates a scoring report containing scoring details and visual charts; supports exporting the scoring report to Excel and PDF formats, and supports printing.

[0067] The preference learning and optimization module is used to collect human feedback, extract preference features through a reinforcement learning recruitment preference learning model, and update the model and database parameters. Data storage and management module: Used to store vector databases, knowledge graphs, question banks, resume data, interview data, evaluation scores and human feedback data. It adopts a distributed storage architecture (master-slave replication mode) to ensure data reliability and provides data query, modification, backup and export interfaces. Task orchestration and collaboration module: Based on the Camunda workflow engine, it uniformly schedules each module and executes tasks according to preset processes; it monitors the running status of each module, handles abnormal situations (such as resume parsing failure, video stream interruption), and triggers an alarm if it fails to retry after 3 automatic retries; it realizes real-time data flow between modules, and uses message queues (Kafka) to ensure the reliability of data transmission and ensure smooth processes.

[0068] The implementation methods of each module and its function in the system are completely consistent with the steps of the method in Implementation Example 1, so they will not be repeated here.

[0069] See system architecture Figure 5 .

[0070] The system consists of a data layer, an algorithm layer, a collaboration layer, a storage layer, and an application layer, realizing a closed loop throughout the entire process from job description input to recruitment decision-making. a. Data layer: Receives data such as JD text, resume files, and human feedback, and stores it in the corresponding database after preprocessing (format conversion, deduplication, and cleaning); b. Algorithm layer: Includes NLP algorithm unit (improved BERT, Prompt Tuning GPT, optimized Sentence-BERT), visual algorithm unit (optimized OCR, attention mechanism micro-expression recognition), speech algorithm unit (optimized ASR, CNN-LSTM speech emotion recognition), and matching and evaluation algorithm unit (improved cosine similarity, adaptive weighted fusion, reinforcement learning model), deployed to GPU server to ensure efficient inference; c. Collaboration Layer: The task orchestration module schedules various functional modules based on the Camunda workflow engine, the status monitoring module monitors the running status in real time, and the process controller supports custom process configuration; d. Storage layer: Redis caches real-time status (such as interview status, online candidates), Milvus stores embedding vectors, Neo4j stores knowledge graphs, MySQL stores structured data, and a distributed file system stores audio and video data; e. Application Layer: Provides interactive interfaces for web platform and mobile app, supporting enterprise users to input job descriptions, manage resumes, monitor interviews, and view results, and supporting candidates to upload resumes, conduct video interviews, and view interview results.

[0071] Example 3 An electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the processor to implement the intelligent interview method based on multimodal AI of Embodiment 1.

[0072] In the embodiments disclosed in this application, a computer storage medium may be a tangible medium that may contain or store programs for use by or in conjunction with an instruction execution system, apparatus, or device. The computer storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of computer storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, and portable compact disc read-only memory (CD). ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0073] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed in this application can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0074] The above are merely preferred embodiments of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should be considered within the scope of protection of the present invention.

Claims

1. An intelligent interview method based on multimodal AI, characterized in that, Includes the following steps: Analyze job descriptions in recruitment information and generate target person profiles; Analyze resumes and generate candidate profiles; Match candidate profiles with target profiles to obtain a candidate screening list; Generate interview questions based on the target person's profile; Send interview invitations to candidates in the candidate screening list. After accepting the invitation, candidates will conduct an AI video interview according to the interview questions. The candidates' answers were scored. Analyze the nonverbal cues from candidates' interviews and assign nonverbal cues assessment scores. Determine the authenticity of the resume and obtain a resume authenticity verification score; Calculate the candidate's overall score and output the candidate's overall score and scores for each dimension.

2. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The process of parsing job descriptions in recruitment information and generating target profiles specifically involves: The BERT pre-trained language model is used as the semantic encoder. The BERT pre-trained language model is trained again using recruitment domain corpus. The trained semantic encoder encodes the job description into a semantic vector sequence. The semantic vector sequence is input into a named entity recognition algorithm that combines bidirectional LSTM and CRF to extract the target person keywords in the job description, including skill requirements, work experience, educational background and ability qualities. The target person keyword weight is calculated based on the frequency of occurrence, positional weight, and domain importance coefficient of the target person's keywords in the job description; the formula for calculating the target person keyword weight is as follows: In the formula, It is the keyword weight of the target person. Let k be the frequency of occurrence of the k-th keyword. The position weight of the k-th keyword. Let be the domain importance coefficient of the k-th keyword, and n be the total number of extracted keywords; A Prompt template is constructed using extracted target keywords and their corresponding weights. The Prompt template is then input into a GPT-based generation model based on Prompt Tuning to generate a structured summary of the target person's profile and related communication techniques. The Sentence-BERT model is used to convert the target person keyword list and the target person profile speech summary into target person keyword embedding vectors and target person profile embedding vectors, respectively. After L2 normalization, they are stored in the Milvus vector database. In the knowledge graph, a six-level association structure of "industry category - job category - target person profile - target person keywords - target person profile dimensions - core responsibilities" is constructed. The TransE algorithm is used to optimize the entity and relationship embedding of the knowledge graph, establish the mapping relationship between each element, and dynamically calculate the weight of the association edge based on the keyword weight and the domain association strength.

3. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The specific steps of parsing resumes and generating candidate profiles are as follows: For image-based resumes, an OCR algorithm that integrates the EAST text detection model and the CRNN text recognition model is used to obtain the text in the image-based resume. BiLSTM is then used to extract key candidate information from the text. The EAST text detection model optimizes the anchor generation strategy, and the CRNN text recognition model introduces an attention mechanism. For text-based resumes, BiLSTM is used to directly extract candidate keywords; candidate keywords include basic personal information, work experience, project experience, skills, and education. Based on the extracted candidate keywords, generate candidate keyword embedding vectors with the same dimensions as the target person's keyword embedding vectors; A summary of candidate profiles and their corresponding statements, generated using a GPT-based generative model based on Prompt Tuning. The Sentence-BERT model is used to convert the candidate profile dialogue summary into candidate profile embedding vectors; The candidate keyword embedding vector and candidate profile embedding vector are L2 normalized and stored in the Milvus vector database and bound to the candidate's unique identifier to establish an association index between vectors and structured information. A six-level association structure is established in the knowledge graph, consisting of "candidate-personal attributes-candidate profile-candidate keywords-candidate profile dimensions-industry field". A dynamic relationship weight update mechanism is adopted to adjust the weight of the association edges in real time according to the matching of the candidate profile and the target person profile.

4. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The matching of candidate profiles with target profiles specifically involves: A matching coefficient calculation model based on cosine similarity and semantic overlap is used to assign a matching coefficient to candidate keywords. The calculation formula is as follows: In the formula, This is the embedding vector for candidate keywords. The embedding vector for the target person's keywords. This is the set of semantic descriptions corresponding to the candidate keywords. This is a set of semantic descriptions corresponding to the keywords of the target person. It represents the coverage of the candidate keyword semantic description set to the target person's keyword semantic description set, used to measure the degree of semantic overlap. This indicates that the number of elements in the set is counted. The fusion weights for the cosine similarity of the embedding vectors. The fusion weights are for semantic coverage; Based on the keyword weight of the target person Keyword matching coefficient with candidate A hierarchical weighted summation algorithm is used to calculate the keyword-level matching score. The formula is as follows: In the formula, n is the number of keywords for the target person. The weight of the keyword for the k-th target person. It is the keyword matching coefficient of the k-th candidate. Keyword type weight; A cosine similarity calculation model is used, incorporating importance weights for vector components, to calculate the vector similarity between the target person's image embedding vector and the candidate person's image embedding vector. : In the formula, The d-th dimension component of the embedding vector for the target person's portrait. The d-th dimension component of the candidate's portrait embedding vector. The importance weights of the d-th dimension components; Combining keyword-level matching scores Similarity to vectors An adaptive weighted fusion algorithm is used to calculate the total matching score. : In the formula, It is the weight of the keyword matching score. These are the weights for vector similarity; the weights are dynamically adjusted based on job type. and ; Based on the mean of the candidate's total match score and standard deviation Calculate the matching threshold Filter out For candidates, a ranking algorithm that combines matching scores and keyword scarcity is used to calculate the ranking priority. The formula for calculating the ranking priority is as follows: , This is a weighting factor for scarce keywords, ranked by priority. Sort in descending order to generate a candidate screening list that includes basic candidate information, matching score, keyword matching details, and scarcity markers.

5. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The specific details of generating interview questions based on the target person's profile are as follows: Keywords for the target person are extracted. Based on the relationship between keywords and profile dimensions in the knowledge graph, candidate questions are retrieved from the question bank. A BERT-based cross-modal semantic relevance calculation model is used to calculate the semantic relevance between each target person keyword and each question text. The formula is as follows: In the formula, Let k be the keyword embedding vector of the target person. Let be the embedding vector of the text for the q-th question. It is the keyword weight of the target person. Let be the weight of the evaluation dimension for the q-th question. This represents the semantic relevance between the keywords of the k-th target person and the q-th question; For each question, a hierarchical aggregation algorithm is used to calculate the sum of relevance scores based on keyword type, and then a weighted fusion is performed to obtain the overall relevance score for the question. : In the formula, A collection of skill-related keywords. It's the weight of skill-related keywords. It is a collection of experience-related keywords. It's the weight of experience-related keywords. This is a collection of keywords related to abilities and qualities. It refers to the weight of keywords related to abilities and qualities; Set a threshold for the overall correlation score Select questions based on their overall relevance score. Exceeding the threshold The questions are sorted in descending order of score, and the top K questions are selected according to the user-defined number of questions K. At the same time, the difficulty of the questions is mapped based on the keyword difficulty level and the question complexity. The mapping relationship is optimized using a decision tree model. The features of the decision tree model include keyword difficulty level, number of knowledge points in the question, number of reasoning steps, and length of the question text.

6. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The specific steps for scoring the candidates' answers are as follows: An end-to-end speech recognition model based on Transformer is adopted, and the model is fine-tuned on interview scenario corpus. A noise suppression algorithm based on spectral subtraction and wavelet transform is introduced to suppress noise. The candidate's speech is converted into text in real time. A text cleaning algorithm is used to remove redundant information, and word segmentation, part-of-speech tagging and entity recognition are performed. Finally, it is converted into an embedding vector of answer keywords. The core keywords of the problem are extracted using a BERT-based keyword extraction model. Combine knowledge graphs to expand the synonyms and related words of the core demand keywords; And convert it into an embedding vector of the core keywords of the problem; Objective questions use knowledge graphs to connect key elements of pre-set standard answers. For subjective questions, a core evaluation dimension system is constructed based on job requirements, and the weight of each evaluation dimension is determined by the analytic hierarchy process. A bidirectional semantic matching model is used to calculate the positive similarity between the answer keywords and the core appeal keywords, and the negative similarity between the core appeal keywords and the answer keywords, and then weighted and fused to obtain the semantic fit. : In the formula, To answer the question about the keyword "embedding vector", The embedding vector represents the core keywords of the problem. It answers the question of the positive similarity between keywords and core appeal keywords. It is the reverse similarity between the core appeal keywords and the answer keywords; A hierarchical evaluation model for logical coherence is constructed based on text syntactic structure analysis, semantic relevance calculation, and paragraph cohesion feature extraction, outputting a logical score. , The hierarchical evaluation model for logical coherence is trained using the random forest algorithm, and its features include syntactic completeness, mean inter-sentence similarity, cohesive word coverage, and topic drift. Calculating the correctness of the answer specifically includes: for objective questions, using an algorithm combining exact matching and fuzzy matching to compare the key elements of the answer with the standard answer. Consistency, calculating the correctness of the answer. The formula is , For an exact match, the value is 0 or 1; This represents the fuzzy matching similarity score, with values ​​ranging from 0 to 1. To accurately match weights, For fuzzy matching weights, for subjective questions, a multi-label classification model is used to evaluate the relevance of the answers based on the core evaluation dimension system, combined with a domain expert rule base to assess professionalism, and the correctness of the answers is calculated. ; Based on semantic fit Logical fractions and the correctness of the answer A dynamic weighted summation formula is used to adjust the weighting coefficients based on whether the questions are objective or subjective, and the content assessment score is calculated accordingly. : In the formula, It is the semantic fit weight. It is a logical score weight. It is the weight of the correctness of the answer.

7. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The analysis of nonverbal information during candidate interviews specifically includes: Real-time acquisition of candidate facial video stream and audio stream; extraction of facial key point coordinates, expression category and eye direction from facial video stream; extraction of speech rate, tone, energy and number of pauses from audio stream to obtain speech feature vector; Input the speech feature vector into the speech emotion recognition model based on the CNN-LSTM hybrid model, output the probability distribution of emotion tendency, and determine whether the emotion tendency is positive, neutral or negative. A statistical analysis model is used to determine the reasonableness of speaking speed. A speaking speed threshold range is set based on job type, and a speaking speed reasonableness score is calculated. , For the candidate's speaking speed, This represents the average speaking speed for the corresponding job position. Calculating intonation stability using fundamental frequency stability , For the base frequency standard deviation The mean of the fundamental frequency; A tone evaluation index system was constructed, and a weighted summation method was used to obtain the tone evaluation score. , The formula is , Different emotional tendency scores were assigned for positive, neutral, and negative emotions. A micro-expression recognition model based on an attention mechanism was constructed. The micro-expression recognition model was pre-trained on the MMI micro-expression dataset. Facial region attention weights were introduced, and different weights were set for the eyes, mouth, eyebrows and other regions. The trained micro-expression recognition model was used to identify micro-expression types, including confidence, tension, sincerity and perfunctoriness. Calculate duration score by combining facial expression duration , Calculate frequency scores for expression duration. , Frequency of facial expression changes; Microexpression assessment scores are calculated using the following formula. : In the formula, Assign scores to different types of micro-expressions; An adaptive weight fusion algorithm is used to dynamically adjust the weight coefficients according to the interview scenario. and Calculate the total score for non-verbal information. : In the formula, The weighting coefficient for the tone assessment score. The weighting coefficients for the micro-expression assessment score.

8. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The process of determining the authenticity of a resume and obtaining a resume authenticity verification score is as follows: Based on the candidate's work experience, project experience, and skill descriptions in the resume, and combined with the professional knowledge system in the knowledge graph, targeted verification questions are generated through the GPT-type generation model optimized by Prompt Tuning. The types of targeted verification questions include detailed follow-up questions, logical verification questions, and professional depth questions. The Prompt template of the GPT-type generation model is dynamically adjusted according to the question type; targeted verification questions are then presented to the candidate. A bidirectional semantic matching algorithm is used to extract key information from the answers to verification questions and convert it into an embedding vector. Convert the corresponding description in the resume into an embedding vector. Meanwhile, keyword overlap and semantic structure similarity are introduced as auxiliary features to calculate the consistency score. : In the formula, This refers to the set of key entities in the candidate's answers to targeted verification questions. The set of key entities in the resume. Semantic structural similarity; A deep evaluation index system is constructed based on the professional knowledge system in the knowledge graph, including richness of detail, professional depth, and logical rigor. A multilayer perceptron is used to score each index based on the candidate's answers to targeted verification questions, and the scores of each index are combined to give a depth score. A multilayer perceptron consists of an input layer, a hidden layer, and an output layer. The activation function of the hidden layer is ReLU, and the activation function of the output layer is Sigmoid. For consistency scores and depth score The weighted average score for resume authenticity verification is as follows: In the formula, It is the score for verifying the authenticity of the resume. These are the weighting coefficients for the consistency score. It is the weighting coefficient of the depth score.

9. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The calculation of the candidate's overall score and the output of the candidate's overall score and scores for each dimension are specifically as follows: Content assessment scores were determined using the analytic hierarchy process (AHP). Non-verbal information total score Resume authenticity verification score and total match score The weights of these four indicators, including the content evaluation score weight, are as follows: Weighting of non-verbal information in total score Weighting of resume authenticity verification score Weight of total matching score ; The candidates' overall scores are calculated as follows: In the formula, It is the candidate's overall score. Indicates normalization; Output the candidate's overall score, scores for each dimension, and scoring details, including keyword matching results, logical analysis conclusions, correctness of answers, non-verbal information analysis results, and resume verification conclusions. At the same time, generate visual charts of the scores for each dimension.

10. The intelligent interview method based on multimodal AI as described in claim 1, characterized in that, The method further includes: receiving human feedback, the types of which include: acknowledging AI recommendations, rejecting AI recommendations, supplementing recommendations, and adjusting the weights of scoring dimensions; and recording the candidate profile, evaluation score vector, and text of the human decision-making reasoning corresponding to the feedback. A BERT model based on an attention mechanism is used to extract recruitment preference features from the text of human decision-making reasons, and output a preference feature vector; the weight adjustment coefficients of the preference feature vector are calculated through a contrastive learning algorithm. The formula is as follows: In the formula, To recognize the number of times AI recommendations were made, To negate the number of times AI recommendations were made, Total number of feedback responses For the preference feature vector, An embedding vector for the target person's portrait. For cosine similarity, Based on weight adjustment coefficient An incremental update algorithm is used to update the weights of the association edges between keywords and profile dimensions in the knowledge graph; The following specific model parameters are adjusted using reinforcement learning: the importance weight of the d-th dimension component of the target person's portrait embedding vector. The weight of keyword matching score Weights of vector similarity Weight of skill-related keywords Weight of experience-related keywords Weight of keywords related to ability and competence Semantic fit weight Logical score weights Weight of correct answer Content evaluation score weighting Weighting of non-verbal information in total score Weighting of resume authenticity verification score Weight of total matching score ; With maximizing the satisfaction level of human feedback as the optimization objective, the satisfaction level of human feedback is used as the reward function for reinforcement learning; the formula for calculating the satisfaction level of human feedback is as follows: In the formula, For human feedback on satisfaction, To recognize the number of times AI recommendations were made, The number of times the scoring results are manually adjusted. This represents the total number of feedback responses.