Model training method and apparatus, data processing method and apparatus, oral cavity examination method and apparatus, and electronic device

By combining a self-supervised learning model with multiple detection methods to process oral examination data, the problems of low efficiency and insufficient accuracy in existing oral medical diagnosis and treatment have been solved, achieving efficient and accurate detection and early diagnosis of oral diseases, thereby improving treatment efficiency and patient satisfaction.

WO2026137519A1PCT designated stage Publication Date: 2026-07-02PEKING UNIV SCHOOL OF STOMATOLOGY

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
PEKING UNIV SCHOOL OF STOMATOLOGY
Filing Date
2025-01-02
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Current oral medicine diagnosis and treatment suffers from low efficiency, insufficient accuracy, and high requirements for doctors' comprehensive skills. In particular, when there is a lack of comprehensive examination and diagnostic awareness, it is easy to miss or misdiagnose. Furthermore, traditional testing techniques have limitations such as radiation hazards, limited imaging accuracy, and cumbersome testing processes.

Method used

By employing a self-supervised learning model combined with multiple detection methods, such as optical diagnosis, electromagnetic wave diagnosis, acoustic diagnosis, and terahertz wave diagnosis, and processing oral examination data, including sound data, optical data, terahertz data, and electromagnetic wave data, through machine learning models, we can achieve efficient and accurate diagnosis of oral health status and diseases.

Benefits of technology

It has improved the efficiency and accuracy of oral diagnosis and treatment, reduced reliance on doctors' subspecialty areas and experience, enabled early disease detection and precise diagnosis, optimized the treatment process, and improved patient satisfaction and compliance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025070044_02072026_PF_FP_ABST
    Figure CN2025070044_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to the technical fields of big data and oral medicine, and provides a model training method and apparatus, a data processing method and apparatus, an oral cavity examination method and apparatus, and an electronic device. The model training method of the present disclosure comprises: acquiring examination samples in oral medical diagnosis and treatment, the examination samples comprising examination data and labels; enhancing the examination samples to acquire training samples; and processing the training samples by means of a self-supervised learning model until training is completed, parameters of the self-supervised learning model being adjusted on the basis of outputted diagnosis results and the labels.
Need to check novelty before this filing date? Find Prior Art

Description

Model training, data processing, and oral cavity detection methods, devices, and electronic equipment

[0001] Cross-reference of related applications

[0002] This application is based on and claims priority to CN application No. 202411907753.0, filed on December 23, 2024, the disclosure of which is incorporated herein by reference in its entirety. Technical Field

[0003] This disclosure relates to the fields of big data and oral medical technology, and in particular to a method, apparatus and electronic device for model training, data processing and oral examination. Background Technology

[0004] The mandible is a free-floating bone connected to the skull only by the temporomandibular joint. Its position is regulated and constrained by multiple factors, including the temporomandibular joint, occlusal contact, jaw muscles, and the central nervous system. It is the only movable bone in the maxillofacial region. Its movement is accomplished through the coordinated action of various components of the stomatognathic system. Through the cusp-fossa contact between the upper and lower teeth, and under the repeated regulation of proprioception in the periodontal tissues, temporomandibular joint, and masticatory muscles, it is memorized by nerves and muscles, gradually forming a unique movement pattern. Whether this movement pattern is normal is one of the important aspects of evaluating the function of the stomatognathic system.

[0005] Mandibular movements are often simplified into opening and closing movements, lateral movements, and protrusion and retraction movements. Tapping movement is an unconscious, habitual small opening and closing movement, related to the repetitive reinforcement of neuromuscular memory in mandibular movements of the stomatognathic system. The frequency, stability, and speed of tapping movement reflect the coordination between the various components of this system and are indicators of... One of the indicators of functional stability.

[0006] When factors affecting mandibular movement, such as the temporomandibular joint and Any abnormal changes in the teeth or other structures can affect mandibular movement, thus impacting the patient's oral health. Experienced clinicians will use visual inspection, palpation, and auscultation to assess mandibular movement and screen for temporomandibular joint disorders (such as clicking or clicking sounds) and dental abnormalities (such as premature contact). Interference) and related diseases and their severity.

[0007] In addition, doctors will use diagnostic tools to contact the treatment site to assist in the treatment. For example, they may use metal instruments to percuss to determine whether there is pain at the root tip and the degree of pain, or use dental floss to check the tightness of the interproximal contact through the interproximal area. Summary of the Invention

[0008] One objective of this disclosure is to improve the efficiency and accuracy of oral medical diagnosis and treatment.

[0009] According to one aspect of some embodiments of this disclosure, a model training method is proposed, comprising: acquiring test samples in oral medical diagnosis and treatment, the test samples including test data and labels; enhancing the test samples to acquire training samples; and processing the training samples through a self-supervised learning model until training is completed, wherein the parameters of the self-supervised learning model are adjusted according to the output diagnostic results and labels.

[0010] In some embodiments, the diagnostic results include oral health status, and the test samples include test samples in a healthy oral state and test samples in at least one oral disease state.

[0011] In some embodiments, the diagnostic results include the condition of the prosthesis, and the test samples include test samples of the prosthesis in different conditions, wherein the condition of the prosthesis includes at least one of the occlusal contact condition of the prosthesis, the degree of osseointegration of the implant, or the condition of the internal preparation.

[0012] In some embodiments, oral health status includes health, or a diagnosis of an oral disease, including temporomandibular joint disorder, dental disease, malocclusion, periodontal disease, prosthesis problems, and maxillofacial tumors.

[0013] In some embodiments, the diagnostic results may also include at least one of the location, number, and severity of the disease.

[0014] In some embodiments, the detection data includes one or more of optical data, acoustic data, terahertz data, or electromagnetic wave data.

[0015] In some embodiments, the optical data includes a three-dimensional image reconstructed based on differences in reflection intensity after receiving optical signals by a sensor by emitting a predetermined type of light onto the surface of soft and hard oral tissues.

[0016] In some embodiments, terahertz wave data includes transmission and reflection data obtained by emitting electromagnetic waves into and receiving the soft and hard tissues of the oral cavity.

[0017] In some embodiments, the electromagnetic wave data includes X-ray absorption data of different parts of the oral tissues obtained by emitting electromagnetic waves into the soft and hard tissues of the oral cavity.

[0018] In some embodiments, the sound data includes data generated by the patient’s active sound source and data generated by a passive sound source, wherein the data generated by the active sound source includes sound data generated during mandibular movement and the data generated by the passive sound source includes sound data generated by the contact between the diagnostic tool and the oral cavity structure.

[0019] In some embodiments, enhancing the detection samples and obtaining training samples includes: when the detection data includes sound data, processing the sound data through a filter to obtain a first spectrogram sample; and processing the first spectrogram sample through at least one of waveform shifting, harmonic distortion, or resampling to obtain a second spectrogram sample, wherein the training samples include the first spectrogram sample and the second spectrogram sample.

[0020] In some embodiments, the label corresponding to the second spectrogram sample is the same as that of the corresponding first spectrogram sample.

[0021] In some embodiments, processing sound data with a filter to obtain a first spectrogram sample includes: processing sound data with a Mel filter to obtain a Mel spectrogram as the first spectrogram sample.

[0022] In some embodiments, processing the first spectrogram sample by at least one of waveform shifting, harmonic distortion, or resampling to obtain the second spectrogram sample includes: extracting time-domain features and frequency-domain features from the first spectrogram sample; obtaining time-frequency fusion features from the time-domain features and frequency-domain features; extracting local features from the time-frequency fusion features and obtaining dynamic position codes from the time-frequency fusion features; extracting context-aware features based on a self-attention mechanism from the local features and dynamic position codes; and fusing the context-aware features and the time-frequency fusion features to obtain the second spectrogram sample.

[0023] In some embodiments, the self-supervised learning model is the Vision Transformer model.

[0024] According to one aspect of some embodiments of this disclosure, a method for processing oral diagnosis and treatment data is proposed, comprising: acquiring detection data of a target patient during oral diagnosis and treatment; and determining a diagnostic result based on the detection data and a machine learning model, wherein the machine learning model is generated by training according to any of the model training methods mentioned above.

[0025] In some embodiments, the oral diagnosis and treatment data processing method further includes: when the detection data includes sound data, converting the sound data of the target patient in oral diagnosis and treatment into a sound spectrogram; wherein, the machine learning model determines the diagnostic result by processing the sound spectrogram.

[0026] According to one aspect of some embodiments of this disclosure, an oral examination method is proposed, comprising: collecting examination data of a target patient during oral diagnosis and treatment; and determining a diagnostic result based on the examination data using a machine learning model, wherein the machine learning model is generated by training based on examination samples from oral medical diagnosis and treatment.

[0027] In some embodiments, the detection data includes one or more of optical data, acoustic data, terahertz data, or electromagnetic wave data.

[0028] In some embodiments, the detection data includes one or more of optical data, acoustic data, terahertz data, or electromagnetic wave data.

[0029] In some embodiments, the optical data includes a three-dimensional image reconstructed based on differences in reflection intensity after receiving optical signals by a sensor by emitting a predetermined type of light onto the surface of soft and hard oral tissues.

[0030] In some embodiments, terahertz wave data includes transmission and reflection data obtained by emitting electromagnetic waves into and receiving the soft and hard tissues of the oral cavity.

[0031] In some embodiments, the electromagnetic wave data includes X-ray absorption data of different parts of the oral tissues obtained by emitting electromagnetic waves into the soft and hard tissues of the oral cavity.

[0032] In some embodiments, the sound data includes at least one of sound data from an active sound source and sound data from a passive sound source. The sound data from the active sound source includes sound data generated by the interlacing and sliding friction of the cusps and fossae of the upper and lower jaws during mandibular movement. The sound data from the passive sound source includes sound data generated by the contact between the diagnostic tool and the internal structures of the oral cavity.

[0033] In some embodiments, the diagnostic results include the condition of the prosthesis, and the test samples include test samples of the prosthesis in different conditions, wherein the condition of the prosthesis includes at least one of the occlusal contact condition of the prosthesis, the degree of osseointegration of the implant, or the condition of the internal preparation.

[0034] In some embodiments, oral health status includes health, or a diagnosis of an oral disease, including temporomandibular joint disorder, dental disease, malocclusion, periodontal disease, prosthesis problems, and maxillofacial tumors.

[0035] In some embodiments, the machine learning model is generated by training according to any of the model training methods mentioned above.

[0036] According to one aspect of some embodiments of this disclosure, a model training apparatus is proposed, comprising: a sample acquisition unit configured to acquire detection samples, the detection samples including detection data and labels; a sample enhancement unit configured to enhance the detection samples and acquire training samples; and a training unit configured to process the training samples through a self-supervised learning model until training is completed, wherein the parameters of the self-supervised learning model are adjusted according to the output diagnostic results and labels.

[0037] According to one aspect of some embodiments of this disclosure, an oral sound data processing apparatus is proposed, comprising: a data acquisition unit configured to acquire detection data of a target patient during oral diagnosis and treatment; and a state determination unit configured to determine a diagnostic result based on the detection data and a machine learning model, wherein the machine learning model is trained and generated according to any of the model training methods mentioned above.

[0038] According to one aspect of some embodiments of the present disclosure, a data processing apparatus is provided, comprising: a memory; and a processor coupled to the memory, the processor being configured to execute any of the methods mentioned above based on instructions stored in the memory.

[0039] According to one aspect of some embodiments of this disclosure, a non-transitory computer-readable storage medium is proposed, having stored thereon computer program instructions that, when executed by a processor, implement any of the methods mentioned above.

[0040] According to one aspect of some embodiments of this disclosure, a computer program product is proposed, including a computer program or instructions that, when executed by a processor, implement any of the methods mentioned above.

[0041] According to one aspect of some embodiments of this disclosure, a computer program is provided for causing a processor to perform any of the methods mentioned above.

[0042] According to one aspect of some embodiments of the present disclosure, an electronic device is provided, comprising: a data acquisition device configured to acquire detection data of a target patient during oral diagnosis and treatment; and a data processor configured to determine a diagnostic result based on the detection data and a machine learning model, wherein the machine learning model is generated by training based on the detection samples.

[0043] In some embodiments, the data acquisition device includes at least one combination of the following: a laser light source, a laser detector and a laser scanning system; an electromagnetic wave source and an electromagnetic wave receiver; a sound sensor; or a terahertz wave source and a terahertz wave detector. Attached Figure Description

[0044] The accompanying drawings, which are included to provide a further understanding of this disclosure and form part of this disclosure, illustrate exemplary embodiments of this disclosure and are used to explain this disclosure, but do not constitute an undue limitation of this disclosure.

[0045] Figure 1 is a flowchart of some embodiments of the oral cavity examination method of this disclosure.

[0046] Figure 2 is a flowchart of some embodiments of the model training method disclosed herein.

[0047] Figure 3 is a schematic diagram of some embodiments of the model training method disclosed herein.

[0048] Figure 4 is a flowchart of some embodiments of sample augmentation in the model training method of this disclosure.

[0049] Figure 5 is a flowchart of some embodiments of the oral diagnosis and treatment data processing method disclosed herein.

[0050] Figure 6 is a schematic diagram of some embodiments of the model training apparatus of this disclosure.

[0051] Figure 7 is a schematic diagram of some embodiments of the oral sound data processing device of this disclosure.

[0052] Figure 8 is a schematic diagram of some embodiments of the data processing device of this disclosure.

[0053] Figure 9 is a schematic diagram of some other embodiments of the data processing apparatus of this disclosure.

[0054] Figure 10 is a schematic diagram of some embodiments of the electronic device disclosed herein. Detailed Implementation

[0055] The technical solutions of this disclosure will be further described in detail below with reference to the accompanying drawings and embodiments.

[0056] With the increasing specialization of dentistry, limitations in theoretical knowledge and clinical skills often lead to a parochial mindset in diagnosis and treatment, lacking a comprehensive approach to examination and treatment. When patients seek treatment, dentists tend to prioritize examination and treatment plans within their own subspecialty, easily leading to missed diagnoses, misdiagnoses, or an inability to address issues outside their area of ​​expertise, thus failing to meet patients' needs for holistic, systematic, and comprehensive treatment. Furthermore, differences in clinical skills and treatment philosophies among dentists in different departments can result in conflicting treatment plans, reducing both clinical efficiency and patient satisfaction and compliance. Therefore, a patient-centered, disease-focused, and collaborative model of general practice and specialty-based dentistry has been proposed. However, this model demands higher levels of comprehensive skills from dentists, requiring long-term professional training and skills assessments to achieve a comprehensive grasp of various specialty treatment methods. To address these issues, this disclosure proposes a model training, data processing, and oral examination method, device, and electronic equipment to improve clinical efficiency, patient satisfaction, and compliance.

[0057] The method proposed in this disclosure uses one or more detection methods to collect detection data and performs oral medical diagnosis through data analysis. This can pinpoint the specific type and location of oral problems, improve diagnostic efficiency, reduce reliance on the doctor's subspecialty, experience, and manual operation, and improve the accuracy and comprehensiveness of diagnostic results.

[0058] Furthermore, most patients only seek medical attention for their chief complaint, easily overlooking other potential oral health issues. They often wait until these issues cause significant discomfort, affecting daily chewing or aesthetics, such as deep cavities causing pain, grade II or III tooth loosening significantly impacting chewing efficiency, or temporomandibular joint disorder causing significant pain and difficulty opening the mouth, before actively seeking treatment, often missing the optimal treatment window. Applying the method proposed in this disclosure to early intelligent detection and diagnosis of oral diseases, supplementing the traditional departmental dental treatment model, will help optimize the treatment process, improve the efficiency and comprehensive treatment level of dental institutions, increase patient satisfaction and compliance, and achieve early screening and diagnosis.

[0059] There are various oral disease detection technologies, such as X-rays, CT scans, magnetic resonance imaging (MRI), and ultrasound. Each technology has its own advantages and can provide some diagnostic evidence, but these methods usually have certain limitations, such as radiation hazards, limited imaging accuracy, cumbersome testing processes, and the inability to provide real-time feedback.

[0060] With the development of new technologies such as acoustics and terahertz waves, the structural detection and disease diagnosis of oral soft and hard tissues will become more comprehensive and detailed, featuring non-invasiveness, high resolution, high sensitivity, real-time monitoring, and rapid detection. The detection technologies disclosed herein include optical diagnostics, electromagnetic wave diagnostics, acoustic diagnostics, and terahertz wave diagnostics.

[0061] Optical diagnostics: A scanner emits specific types of light that shines onto the surface of the soft and hard tissues of the oral cavity, causing reflection or scattering. Different tooth tissues have different optical properties; for example, carious tissues have higher organic components and water content, resulting in stronger absorption of light and reduced reflectivity. After the sensor receives and analyzes the optical signals, the computer reconstructs high-precision three-dimensional images based on the differences in reflection intensity of different tissues and regions. This allows for the acquisition of surface models of the soft and hard tissues of the oral cavity, and even internal tissue structures, thereby enabling the detection of common oral problems such as ulcers, tartar, and interproximal caries. This detection technology is characterized by being non-invasive, free of ionizing radiation, and convenient, but its penetration is limited and may be affected by surface contaminants.

[0062] Electromagnetic wave diagnosis: The conduction characteristics of electromagnetic waves can reflect the conductivity and impedance of different tissues. When electromagnetic waves are used to examine oral tissues, hard tissues (such as enamel and dentin) absorb more X-rays, appearing as brighter areas on the image, while soft tissues (such as gingiva and pulp) absorb less X-rays, appearing as darker areas on the image. Analyzing these changes in characteristics can further determine the location of lesions and can be used for the early detection of oral tumors, inflammation, and soft tissue lesions. This detection technique is characterized by its ability to clearly and accurately display the internal structure of teeth, but it carries a certain risk of radiation exposure.

[0063] Acoustic diagnosis: Experienced clinicians will determine whether there are abnormalities in occlusal contact based on the sound characteristics of the patient's mandibular movement and teeth tapping. They will use diagnostic tools in contact with the treatment site to assist in the diagnosis; for example, they may use metal instruments for percussion to determine the pain and degree at the root apex, or use dental floss to check the tightness of interproximal contact. Different diseases or different degrees of disease present different acoustic characteristics. This testing technology is characterized by its speed, convenience, non-invasiveness, low technical sensitivity, and comprehensiveness, but it does not generate visual images and directly outputs the test results.

[0064] Terahertz wave diagnosis: Terahertz waves refer to electromagnetic waves in the range of 0.1–10 THz. Imaging can be divided into two types: transmission and reflection. The transmission path can be used to determine the refractive index and absorption coefficient of tooth enamel, dentin, and caries; the reflection path can effectively detect early caries and other tissue types. Terahertz waves have strong penetrating power in biological tissues, effectively detecting structural changes within the soft and hard tissues of the oral cavity. Furthermore, terahertz waves are highly sensitive to tissue hydration, and can be used to detect minute lesions, tumors, and inflammation in teeth, gums, and bones. This detection technology has the advantages of high resolution, non-destructive operation, no ionizing damage, and the ability to identify molecular fingerprint spectra.

[0065] The detection technology used to acquire detection data in the embodiments shown below can be one or more of the optical diagnostics, electromagnetic wave diagnostics, acoustic diagnostics, and terahertz wave diagnostics mentioned above. The embodiments primarily use acoustic diagnostics to acquire sound data and apply it as an example to describe the model training, data processing, and oral cavity detection methods, devices, and electronic equipment of this disclosure. The types of detection data and detection technologies used in the embodiments are merely examples of feasible implementations and do not constitute an undue limitation on this disclosure. Based on the solutions in the embodiments of this disclosure, the model training, data processing, and oral cavity detection methods of this disclosure can be applied to scenarios using optical diagnostics, electromagnetic wave diagnostics, and terahertz wave diagnostics, and are also within the protection scope of this disclosure.

[0066] A flowchart of some embodiments of the oral cavity examination method disclosed herein is shown in Figure 1.

[0067] In step S11, during the oral diagnosis and treatment of the patient, the target patient's detection data during the oral diagnosis and treatment are collected. In some embodiments, sound data of the target patient during the oral diagnosis and treatment can be collected by a sound sensor. The acoustic information in the oral medical diagnosis and treatment process comes from the patient's active sound source and passive sound source. Active sound sources are mostly generated during mandibular movement, where the cusps and fossae of the upper and lower teeth interlock and slide and rub. Mandibular movement is completed by the coordinated structure of various components of the stomatognathic system. Through the contact relationship between the cusps and fossae of the upper and lower teeth, and under the repeated adjustment of proprioception in the periodontium, temporomandibular joint, and masticatory muscles, it is memorized by nerves and muscles, gradually forming a movement pattern with individual characteristics, including opening and closing movements, lateral movements, and protrusion and retraction movements. When factors related to mandibular movement, such as the temporomandibular joint and Any abnormal changes in the teeth or other structures can affect mandibular movement, thus impacting the patient's oral health. Experienced clinicians will use visual inspection, palpation, and auscultation to assess mandibular movement and screen for temporomandibular joint disorders (such as clicking or clicking sounds) and dental abnormalities (such as premature contact). Interference) and related diseases and their severity. Based on the specific manifestations of mandibular movement, the occlusal contact of the restorations (inlays, full crowns, complete dentures, etc.) is judged in reverse to determine whether it is normal, guiding clinicians to make targeted occlusal adjustments, thereby improving the quality of restorations and the efficiency of diagnosis and treatment.

[0068] Passive sound sources primarily originate from the sounds produced by dentists using diagnostic tools. For example, percussion with metal instruments to determine the pain and degree at the root apex, or the use of dental floss to check the tightness of interproximal contact, all produce corresponding sounds. These acoustic signals contain and reflect a wealth of basic oral health information. However, typically only experienced specialists can fully utilize this acoustic information to assist in the diagnosis of relevant diseases, requiring a high degree of clinical technical sensitivity.

[0069] Sound data generated by an active sound source during the patient's jaw movements is collected using a sound sensor. In some embodiments, the patient can be guided to perform jaw movements freely, and sound data during these movements can be collected using a sound sensor within a predetermined distance range in the patient's mouth. This method reduces the requirements on the patient's movement patterns, improving user-friendliness and feasibility.

[0070] In some embodiments, during the sound data acquisition process, the patient can be guided to perform specified mandibular movements, such as clenching, lateral movement, and protrusion, respectively. Sound data for each movement mode is collected and the movement type is labeled. This method can increase the amount of effective information in the sound data for diagnosing oral diseases, facilitating the exposure of more and better oral problems in patients; it also reduces the difficulty of subsequent data processing and helps improve diagnostic accuracy.

[0071] Sound sensors are used to collect sound data generated by passive sound sources from patients. This includes sound data produced when a physician uses diagnostic tools to contact the internal structures of the oral cavity, such as the sound data produced when a physician taps teeth with instruments or uses dental floss to examine adjacent surfaces. This method can extract rich sound data, providing more data support for subsequent diagnoses and helping to improve the accuracy of diagnostic results.

[0072] In step S12, the sound data is processed using a machine learning model to determine the diagnostic result. The machine learning model is generated based on sound samples trained on it.

[0073] In some embodiments, based on the machine learning model architecture for processing sound data in related technologies, the model can be trained using sound samples with labeled diagnostic results to obtain a machine learning model that can be used in step S12. This method allows for model training based on an existing machine learning model for processing sound data, improving convenience and efficiency.

[0074] In some embodiments, the aforementioned machine learning model can be a deep learning model. Deep learning, with its powerful and efficient learning algorithms and end-to-end feature extraction capabilities, is a subset of artificial intelligence. It uses hierarchical algorithms similar to human neural networks within a computer system to simulate human cognitive abilities, possessing the ability to process data including audio, images, and text. Deep learning models can assist doctors and researchers in achieving high-efficiency and high-quality medical image recognition, electronic medical record analysis, drug development, genomics analysis, and acoustic information detection. Deep learning-based acoustic information detection methods are particularly valuable due to their high accuracy and non-invasiveness.

[0075] In some embodiments, the diagnostic results include diagnostic results for oral diseases, such as oral health status. Oral health status includes being healthy or a diagnostic result for an oral disease. Oral diseases mentioned in this disclosure include temporomandibular joint disorders, dental diseases (such as supernumerary teeth, caries, cracked teeth, pulpitis, periapical periodontitis, impacted teeth, etc.), and malocclusion disorders (such as premature contact, etc.). Interference, periodontal diseases (such as alveolar bone resorption, loose teeth, food impaction, etc.). In some embodiments, the diagnostic results can be specific to the type of disease, such as caries, periapical periodontitis, etc. In some embodiments, the diagnostic results may also include the location, number, and severity of the disease, such as specifying the target tooth location, the presence of several diseases or diseases in several locations, and the severity of the disease, thereby facilitating the dentist to locate the disease and make subsequent diagnostic and treatment procedures, and avoiding missed diagnoses.

[0076] In some embodiments, the diagnostic results include the occlusal contact status of the restoration (e.g., inlays, crowns, complete dentures, etc.), whether the occlusal contact is normal after the patient uses the restoration, for example, requiring patients with complete dentures to perform occlusal movements and collecting sound data. Based on the sound, it can be determined whether the occlusal contact of the denture is uniform, guiding the dentist whether further grinding is needed, and determining the grinding points based on the sound. This method can assist dentists in adjusting the restoration, improve adjustment efficiency, avoid adjustment errors caused by unclear patient descriptions or inaccurate perceptions, and improve the accuracy of restoration adjustments.

[0077] The method described in the above embodiments can be used to determine the presence of temporomandibular joint disorder, dental-related diseases (such as supernumerary teeth, caries, cracked teeth, pulpitis, periapical periodontitis, impacted teeth, etc.), and occlusal contact conditions (such as premature contact). The system can accurately analyze the specific location, number, type, and extent of oral problems such as interference, periodontal diseases (alveolar bone resorption, loose teeth, food impaction, etc.) and restoration conditions (implant osseointegration, internal preparation), in order to improve the efficiency of early diagnosis through comprehensive systemic analysis.

[0078] In some embodiments, the sound data can first be converted into other types of data, such as text data or image data. Then, based on the machine learning model architecture for processing text and image data in related technologies, the model is trained using the text and image data corresponding to the labeled sound samples to obtain a machine learning model that can be used in step S12. This method expands the range of selectable machine learning model architectures, making it easier to select a more suitable and accurate model for training, thereby improving the accuracy of oral cavity detection.

[0079] In some embodiments, the aforementioned sound samples include sound samples in a healthy oral cavity state and sound samples in at least one oral disease state. For example, it includes sound samples in a healthy oral cavity state, and at least one of the following: sound samples in a temporomandibular joint disorder state, sound samples in various dental disease states (e.g., sound samples in a loose tooth state, sound samples in a carious tooth state), sound samples in various occlusal disorder states (e.g., sound samples with premature contact, sound samples with...). The examples include sound samples under interference conditions and sound samples under various periodontal disease conditions. These are merely examples; additional sound sample data can be collected based on the specific oral diseases requiring examination, and the sound sample set can be expanded. These examples do not constitute an undue limitation on this disclosure. The number of sound samples for each condition is greater than a predetermined value (e.g., more than 10) to ensure the model fully learns the sound features of the corresponding condition and improves the model's accuracy.

[0080] In some embodiments, the aforementioned sound samples include sound samples of the prosthesis under normal occlusal contact conditions, and sound samples of at least one prosthesis under abnormal occlusal contact conditions.

[0081] The sound sample consists of two parts: sound sample data and a label. The label includes the oral health status of the subject generating the sound sample data or the occlusal contact status of the restoration. Oral health status can be described as healthy or a diagnosis of oral disease. Specifically, oral diseases include temporomandibular joint disorder, dental diseases (such as supernumerary teeth, caries, cracked teeth, pulpitis, periapical periodontitis, impacted teeth, etc.), and occlusal disorders (such as premature contact, etc.). The tests assess various oral diseases, including those related to dental problems such as tooth decay, periodontal diseases (e.g., alveolar bone resorption, loose teeth, food impaction), restoration conditions (implant osseointegration, internal preparation), and oral and maxillofacial tumors (ameloblastoma, pleomorphic adenoma, etc.). The occlusal contact status of the restoration is also assessed, indicating whether it is normal or abnormal. Machine learning models trained on these samples can not only determine whether the patient has oral diseases and whether the occlusal contact status of the restoration is normal, but also pinpoint the specific type of oral disease, thus improving the efficiency of oral examinations.

[0082] In some embodiments, the label also includes the severity of the disease; for example, multiple severity levels can be set. Using such sound samples makes it easier to include the severity of the disease in the output diagnostic results, providing doctors with more information and further improving the efficiency of oral examinations.

[0083] In some embodiments, the label may also include at least one of the location, number, and severity of the disease, such as specific to the target tooth position, several diseases or diseases in several locations, and the severity of the disease, so that the trained model has the ability to diagnose the type, location, and severity of the disease, improve the richness of the diagnostic results, and further improve the efficiency of oral examination.

[0084] In some embodiments, the above machine learning model can be trained and generated according to any of the model training methods described below.

[0085] Based on the methods in the above embodiments of this disclosure, it is possible to collect test data generated by the examinee during oral diagnosis and treatment, process the test data using a machine learning model, identify the examinee's oral health status or restoration status, improve diagnostic efficiency, reduce the dependence of diagnostic operations on the doctor's subspecialty, experience and human operation, and improve the accuracy and comprehensiveness of diagnostic results.

[0086] The oral examination methods described in the above embodiments can provide patients with timely and comprehensive preventative treatment advice, optimize the treatment process, shorten treatment time, and improve treatment efficiency. In some embodiments, the diagnostic results for the presence of various common oral diseases or abnormal occlusal contact of prostheses can be accurate to the specific location, number, and severity of the disease, reducing the workload and comprehensive skill requirements for doctors and improving the comprehensiveness of the diagnosis.

[0087] In some embodiments, the detection data in the embodiment shown in FIG1 can be one or more of optical data, acoustic data, terahertz data, or electromagnetic wave data. Optical data includes a three-dimensional image reconstructed based on differences in reflection intensity after receiving optical signals through a sensor by emitting a predetermined type of light onto the surface of the soft and hard tissues of the oral cavity. Terahertz wave data includes transmission and reflection data obtained by emitting and receiving electromagnetic waves into the soft and hard tissues of the oral cavity. Electromagnetic wave data includes X-ray absorption data of different parts of the oral tissues obtained by emitting electromagnetic waves into the soft and hard tissues of the oral cavity.

[0088] In some embodiments, multiple detection technologies can be employed to obtain detection results based on multiple detection data. As mentioned above, each detection technology has its own applicable scope, as well as its own advantages and disadvantages. For example, optical diagnostics has the advantages of being non-invasive, non-ionizing, and convenient, but its penetration is limited and it may be affected by surface contaminants; electromagnetic wave diagnostics has the advantage of clearly and accurately displaying the internal structure of teeth, but it has certain radiation risks; acoustic diagnostics has the advantages of being fast, convenient, non-invasive, low in technical sensitivity, and comprehensive, but it does not generate visual images and directly outputs detection results; terahertz wave diagnostics is suitable for detecting minute lesions, tumors, and inflammations in teeth, gums, and bones, and has the advantages of high resolution, non-destructive, non-ionizing damage, and the ability to identify molecular fingerprint spectra.

[0089] Different imaging technologies have varying applications and characteristics. By fusing these technologies, information at different levels and dimensions can be obtained simultaneously, such as anatomical structures, functional states, and blood flow dynamics. This clearly displays the extent, size, and location of lesions, improving diagnostic accuracy and early diagnosis rates, enabling precise and personalized medicine, enhancing the patient experience and treatment outcomes, and reducing trauma. With continuous technological advancements, multimodal biomedical imaging technology will become more precise, convenient, and cost-effective. Integrated and intelligent imaging systems will provide more comprehensive, real-time, and dynamic diagnostic and treatment support for oral medicine.

[0090] In some embodiments, one or more of optical data, terahertz data, or electromagnetic wave data can be processed using the model training and data processing methods of this disclosure, or the processing results obtained using methods and apparatus in related technologies can be combined with the model training and data processing methods of this disclosure based on acoustic data. For example, after acquiring high-resolution optical images of oral soft and hard tissues, spectral analysis algorithms are used to analyze the scattering and absorption characteristics of light, thereby diagnosing the health status of the tissues; after acquiring electromagnetic response data of oral tissues, parameters such as tissue conductivity and impedance are obtained through calculation and analysis to conduct a health assessment of oral tissue structures, such as the degree of alveolar bone resorption and the degree of implant osseointegration; after acquiring high-resolution terahertz images, the physical properties of oral tissues are assessed through the propagation characteristics of terahertz waves, and potential lesions inside the oral cavity, such as lichen planus and squamous cell carcinoma, are detected and identified from a pathological perspective.

[0091] The method described in this embodiment can reduce the burden of early model training, improve diagnostic accuracy and early diagnosis rate, enable precise and personalized medicine, improve patient experience and treatment effectiveness, reduce trauma, and improve system deployment efficiency.

[0092] In some embodiments, the oral examination method disclosed herein can be used in the early stages of a patient's medical visit to assist clinicians in conducting a comprehensive examination of the oral and maxillofacial system, thereby achieving early detection and early intervention and treatment, and thus improving the oral health level of the entire population.

[0093] To improve the ability of machine learning models to process detection data during oral diagnosis and treatment, and to enhance the accuracy of diagnostic results, this disclosure also proposes a model training method, as shown in Figure 2 in some embodiments. In this embodiment, data processing also primarily uses sound data as an example. For optical data, terahertz data, or electromagnetic wave data, based on the technical concepts mentioned in this disclosure, and combined with relevant technologies for processing corresponding types of data, the data processing process is implemented.

[0094] In step S21, a test sample is obtained for oral medical diagnosis and treatment. The test sample includes test data and labels. In some embodiments, the test sample may be collected and labeled during routine work; in some embodiments, volunteers may be recruited to collect sound data, and the data may be labeled according to the diagnosis results of the volunteers' oral health status or the status of their restorations to form a test sample.

[0095] In some embodiments, a sample database can be constructed to store the test samples.

[0096] The test samples include test samples in a healthy oral cavity state and test samples in at least one oral disease state. In some embodiments, the test samples include test samples in different states of the prosthesis, wherein the state of the prosthesis includes at least one of the occlusal contact state of the prosthesis, the degree of implant osseointegration, or the state of the internal preparation.

[0097] Oral health status includes being healthy or having a diagnosis of oral diseases, including temporomandibular joint disorders, dental diseases (such as supernumerary teeth, caries, cracked teeth, pulpitis, periapical periodontitis, impacted teeth, etc.), and malocclusion (such as premature contact, etc.). Oral diseases include: tooth erosion, periodontal diseases (such as alveolar bone resorption, loose teeth, food impaction, etc.), restoration conditions (implant osseointegration, internal preparation), and oral and maxillofacial tumors (ameloblastoma, pleomorphic adenoma, etc.). The above-mentioned oral diseases are merely examples and can be extended to all kinds of oral diseases known in this art. The above examples do not constitute an undue limitation on this disclosure. The diagnostic results will specify the name of the disease, such as supernumerary teeth, premature contact, alveolar bone resorption, etc., thus providing sufficient information.

[0098] The occlusal contact status of the prosthesis includes a normal occlusal contact status or an abnormal occlusal contact status. For example, an abnormal occlusal contact status includes uneven occlusal contact. In some embodiments, when the occlusal contact is uneven, the occlusal contact status of the prosthesis also includes information on the grinding points.

[0099] In step S22, the detection samples are enhanced to obtain training samples.

[0100] In some embodiments, enhancing the detection samples may include increasing the number of samples. This is because collecting detection samples is time-consuming, while model training requires massive amounts of sample data. Sample augmentation can increase the number of samples and improve the accuracy of model training. In some embodiments, sample augmentation can be performed through at least one of waveform shifting, harmonic distortion, or resampling.

[0101] In some embodiments, enhancing the detection samples may include processing such as noise reduction on the detection sample data to reduce the amount of interference information and improve the accuracy of model training.

[0102] In some embodiments, enhancing the detection sample may also include converting the detection sample data into text or image data, for example, converting sound sample data into a spectrogram to visualize the audio data and reduce the difficulty of subsequent processing.

[0103] In some embodiments, taking a sound sample as an example, the method for enhancing a sound sample is shown in FIG4, including steps 421 and 422.

[0104] In step 421, the sound data is processed by a filter to obtain a first spectrogram sample.

[0105] The filter can be a Mel filter, as shown in Figure 3. The sound data is passed through the Mel filter and the output Mel spectrogram is used as the first spectrogram sample.

[0106] In some embodiments, the process of converting audio data into a spectrogram includes pre-emphasis, framing, windowing, STFT (Short-Time Fourier Transform), and conversion according to the correspondence between frequency and megapixel values. In some embodiments, pre-emphasis can be considered as passing the audio data through a high-pass filter, which can not only reduce some of the amplitude-frequency distortion and frequency response changes, but also reduce low-frequency noise in the audio signal. Framing, windowing, and STFT are performed on the pre-emphasized audio waveform to obtain the spectrogram matrix. This process is shown below:

[0107] Where y(t) is the time-domain signal, x(f,r) is the frequency-domain signal, w(tr) represents the Hamming window centered at r, f is the frequency, and r is the frame length.

[0108] In some embodiments, the window jump size is 256, and the STFT window size is 512.

[0109] Mel filters are designed based on the characteristics of the human ear to improve the absorption of lower frequencies in sound. In some embodiments, the number of Mel filters per second in an audio clip is set to 128. The formula for the Mel filter is as follows:

[0110] Among them, f mel It is the calculated Mel-scale frequency, where f is the Hertz frequency.

[0111] The Mel filter bank mimics the human ear's filtering of speech, placing M (e.g., 512) triangular filters within a frequency range of an audio segment, with the filter layout decreasing in density. The filter width increases with increasing Hertz frequency. Each filter overlaps by 50% to prevent information loss. These filters are displayed as equal-width on the Mel scale. Finally, the output matrix is ​​converted into a spectrogram.

[0112] In some embodiments, the sampling rate of the sound data is set to 16000 Hz. Due to the varying data sizes, the sound sample data is standardized to 4 seconds, resulting in a spectrogram size of 128*251. This spectrogram is then scaled proportionally to 224*224, which facilitates subsequent processing of spectrogram data of the same size and reduces the difficulty of data processing.

[0113] Considering the unique characteristics of the sound generated by tooth collision and the significant changes in signal over time, directly performing STFT on the sound data would lead to distortion of some audio waveforms. The method described in the above embodiment can reduce waveform loss and increase the amount of effective information in the converted spectrum, thereby improving the accuracy of the model.

[0114] In step 422, the first spectrogram sample is processed by at least one of waveform shifting, harmonic distortion, or resampling to obtain the second spectrogram sample. The first and second spectrogram samples are combined to obtain the training sample. In this way, the sample is expanded on the basis of realizing audio data visualization, thereby improving the accuracy of model training.

[0115] In some embodiments, a time-frequency feature enhancement module (TFAM) can be set up, as shown in Figure 3, to expand more sample data based on the spectrum output of the Mel filter as subsequent model input.

[0116] In some embodiments, the Time-Frequency Feature Enhancement (TFAM) module is used to extract feature information from the spectrogram in both the time and frequency domains, obtaining time-domain and frequency-domain features. Then, time-frequency fusion features are obtained based on these features. For example, horizontal and vertical convolution operations are performed on the spectrogram, extracting frequency-domain features through 1x3 convolutional blocks and time-domain features through 3x1 convolutional blocks. The frequency-domain and time-domain features are then summed to obtain the fusion features in the time-frequency domain. This time-frequency domain information guides the learning of the attention matrix, enhancing the fusion and representation capabilities of feature information and distinguishing between valid and invalid sound data (e.g., jaw movement sounds and non-jaw movement sounds), thereby reducing interference from irrelevant information and improving the learning of valid information.

[0117] Furthermore, local features are extracted from the fused features, and dynamic positional encoding is obtained based on the time-frequency fusion features. For example, convolutional layers are used to extract local features, enhancing the model's spatial feature extraction capability. Let the input features be X∈R. C×H×W Where C is the number of channels, and H and W are the height and width, respectively. Local feature aggregation can be represented as: X local =Conv(X) (3)

[0118] Here, Conv represents the convolution operation.

[0119] Dynamic positional encoding can be implemented using learnable convolutional kernels, expressed as: P = Conv pos (X) (4)

[0120] Among them, Conv pos It is a convolution operation used to generate positional encodings.

[0121] Context-aware features are extracted based on local features and dynamic positional encoding using a self-attention mechanism. For example, this step combines local features and positional encoding to extract contextual information via a self-attention mechanism. First, the local features and positional encoding are merged to obtain the merged features: X merged =X local +P (5)

[0122] Then, a self-attention mechanism is applied to obtain context-aware features:

[0123] Where Q, K, and V are the transformation functions for query, key, and value, respectively, and d is the dimension of the key.

[0124] A second spectrogram sample is obtained by fusing context-aware features and time-frequency fusion features. The context-aware features are then fused with the original features, and the feature representation is: X out =X+FFN(X) cot (7)

[0125] Here, FFN stands for Feedforward Neural Network, which is used to further transform contextual features.

[0126] Different convolutional layers are combined in parallel, and the matrices resulting from the processing of different convolutional layers are concatenated along the depth dimension to form a deeper matrix.

[0127] This approach effectively processes input images and extracts richer and more diverse features. It can efficiently expand the depth and width of the network, improving the accuracy of deep learning networks while preventing overfitting.

[0128] In some embodiments, the above operations can be implemented using a machine learning model with relevant functions (such as CotNet) to improve implementation efficiency.

[0129] In step S23, the training samples are processed by a self-supervised learning model until training is complete. During the training process, the parameters of the self-supervised learning model are adjusted according to the output diagnostic results and labels.

[0130] In some embodiments, a self-supervised learning model can be pre-deployed, as shown in Figure 3. The input end receives training sample data, which is encoded and decoded, and the output end outputs diagnostic results. By comparing the results with the labels corresponding to the training sample data, the parameters of the self-supervised learning model are adjusted to achieve the learning of the self-supervised learning model.

[0131] In some embodiments, the self-supervised learning model can be a ViT (Vision Transformer). This model, built on the Transformer architecture, is capable of detecting complex patterns in images and understanding global context. After the processing described above, the audio data has been transformed into a spectrogram, i.e., image data. Using a ViT model designed for image processing reduces the difficulty of processing audio data.

[0132] For example, suppose the input image has dimensions H×W×C, where H and W represent the height and width of the image, respectively, and C represents the number of color channels. ViT first divides the image into N small patches of size P×P, where N and P are positive integers. Each patch can be considered as a "word", N = HW / P. 2 Each small piece is then flattened and transformed into a D-dimensional vector. Specifically, if each small piece is considered as a vector... Through a learnable linear projection (weight matrix) Then, the embedding representation z of each small block is obtained. p =x p E.

[0133] Furthermore, ViT introduces position embeddings to encode the relative or absolute positions of image patches within the original image. Considering that the Transformer itself does not have the ability to process sequence positional information, a position embedding vector is added for each image patch embedding. This yields the final embedded input z. p ′=z p +pos.

[0134] The embedded image patches are fed into a series of Transformer encoders for processing. Each encoder contains two main parts: MSA (Multi-Head Self-Attention) and FFN (Feed-Forward Network). For each layer in the encoder, the input is... The output is also a matrix of the same shape. This process can be represented as: MSA(Z) = Concat(head1, head2, ..., head...). h ,)W O (8)

[0135] Among them, each head A self-attention mechanism has been implemented. and It is a learnable parameter matrix, where h is the number of heads, and d is the number of heads.k This is the dimension of each head. FFN(Z)=max(0,ZW1+b1)W2,+b2 (9)

[0136] Where W1, b1, W2, b2 are the parameters of the FFN layer.

[0137] After processing the embedded image patches, ViT uses a "Category" (CLS) labeled embedding as the output of the image classification task. The final CLS-labeled embedding is passed through a fully connected layer (linear layer) to predict the image category.

[0138] In some embodiments, a small-skeleton ViT, such as ViT_B16, is employed. The ViT_B16 model divides the input image into smaller patches and transforms each patch into a vector representation using linear embeddings. These patch embeddings, along with the location embeddings that provide spatial information, are then passed through multiple layers of a Transformer encoder. Within these layers, the model uses a self-attention mechanism to measure the importance of each patch to other patches, effectively capturing long-range dependencies. The resulting transformed feature representations are then used for tasks such as image classification and other visual processing objectives. This approach improves computational efficiency while maintaining performance.

[0139] The methods described in the above embodiments of this disclosure can reduce the dependence of model training on collected real sample data through sample augmentation, reduce the burden of sample collection, improve model training efficiency, and improve the accuracy of oral cavity detection of the trained model.

[0140] Furthermore, this disclosure also proposes a method for processing oral diagnosis and treatment data. This method, based on the model training method mentioned above, processes the detection data from the oral medical diagnosis of the examinee. In some embodiments, as shown in FIG5, the method includes steps S51 and S52.

[0141] In step S51, detection data of the target patient during oral diagnosis and treatment is acquired. The detection data includes one or more of optical data, sound data, terahertz data, or electromagnetic wave data. In some embodiments, sound data may come from a sound sensor; optical data may come from a laser scanning system; terahertz data may come from a terahertz wave detector; and electromagnetic wave data may come from an electromagnetic wave receiver. In some embodiments, detection data generated remotely can be acquired via a network; in some embodiments, detection data pre-stored in memory can be extracted.

[0142] In step S52, the diagnostic result is determined based on the detection data and a machine learning model, which is generated by training any of the model training methods mentioned above.

[0143] In some embodiments, the machine learning model can process the detection data and obtain corresponding diagnostic results.

[0144] In some embodiments, between steps S51 and S52, the detection data can be converted into a detection spectrogram. The machine learning model can process image data and obtain corresponding oral cavity state information by processing the sound spectrogram.

[0145] The methods described in the above embodiments enable the use of trained machine learning models to detect data during the oral diagnosis and treatment process of the patient, thereby obtaining the patient's oral health status or the status of the restorations. This reduces the reliance of diagnostic operations on the doctor's subspecialty, experience, and human intervention, and improves the efficiency, accuracy, and comprehensiveness of the diagnosis.

[0146] In some embodiments, multiple detection technologies can be used, corresponding devices can be used to obtain multiple detection data, and corresponding detection results can be obtained based on the detection data. The detection results are then combined to obtain the final result of oral diagnosis and treatment.

[0147] The method described in this embodiment can simultaneously obtain information from different levels and dimensions, thereby improving diagnostic accuracy and early diagnosis rate, enabling precise and personalized medicine, enhancing patient experience and treatment outcomes, and reducing trauma.

[0148] Schematic diagrams of some embodiments of the model training apparatus 61 disclosed herein are shown in Figure 6.

[0149] The sample acquisition unit 611 is capable of acquiring test samples in oral medical diagnosis and treatment, the test samples including test data and labels. In some embodiments, the sample acquisition unit 611 can perform the method in any embodiment of step S21 above.

[0150] The sample enhancement unit 612 can enhance the detection samples to obtain training samples. In some embodiments, the sample enhancement unit 612 can perform the method in any embodiment of step S22 above.

[0151] The training unit 613 can process training samples through a self-supervised learning model until training is complete, wherein the parameters of the self-supervised learning model are adjusted according to the output diagnostic results and labels. In some embodiments, the training unit 613 can perform the method of any embodiment of step S23 above.

[0152] The model training apparatus in the above embodiments of this disclosure can reduce the dependence of model training on collected real sample data through sample augmentation, reduce the burden of sample collection, improve model training efficiency, and improve the accuracy of oral cavity detection of the trained model.

[0153] Schematic diagrams of some embodiments of the oral diagnosis and treatment data processing device 72 disclosed herein are shown in FIG7.

[0154] The data acquisition unit 721 is capable of acquiring the test data of the target patient during oral diagnosis and treatment. In some embodiments, the data acquisition unit 721 can perform the method in any embodiment of step S51 above.

[0155] The diagnostic result determination unit 722 can determine a diagnostic result based on the test data and a machine learning model, wherein the machine learning model is generated by training according to any of the model training methods described above. The diagnostic result includes oral health status or restoration status. In some embodiments, the status determination unit 722 can execute the method in any embodiment of step S52 described above.

[0156] In some embodiments, the oral diagnosis and treatment data processing device 72 further includes a conversion unit 723, which can convert the detection data obtained by the data acquisition unit 721 into a spectrum diagram, and then the state determination unit 722 can process the spectrum diagram through a machine learning model to obtain a diagnostic result.

[0157] The oral diagnosis and treatment data processing device in the above embodiments of this disclosure can use a trained machine learning model to detect the diagnosis and treatment data of the patient during the oral diagnosis and treatment process, obtain the patient's oral health status or restoration status, reduce the reliance of diagnostic operations on the doctor's sub-specialty fields, experience and human operation, and improve the efficiency, accuracy and comprehensiveness of the detection.

[0158] A schematic diagram of one embodiment of the data processing device disclosed herein is shown in Figure 8. The data processing device includes a memory 801 and a processor 802. The memory 801 can be a disk, flash memory, or any other non-volatile storage medium. The memory is used to store instructions in the corresponding embodiments of the model training method or oral diagnosis data processing method described above. The processor 802 is coupled to the memory 801 and can be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 802 is used to execute the instructions stored in the memory, which can improve the accuracy of oral diagnosis using machine learning models.

[0159] In one embodiment, as shown in FIG9, the data processing device 900 may include a memory 901 and a processor 902. The processor 902 is coupled to the memory 901 via a BUS bus 903. The data processing device 900 may also be connected to an external storage device 905 via a storage interface 904 to access external data, and may also be connected to a network or another computer system (not shown) via a network interface 906. Further details are omitted here.

[0160] In this embodiment, storing data instructions in a memory and then processing those instructions with a processor can improve the accuracy of oral medical diagnosis and treatment.

[0161] In another embodiment, a computer-readable storage medium stores computer program instructions that, when executed by a processor, implement the steps of the method in the corresponding embodiment of the model training method or oral diagnostic data processing method. Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, apparatus, or computer program products. Therefore, this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this disclosure can take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0162] A schematic diagram of some embodiments of the electronic device disclosed herein is shown in FIG10, including a data acquisition device and a data processor.

[0163] The data acquisition device 1010 is capable of collecting test data of a target patient during oral diagnosis and treatment. In some embodiments, the data acquisition process of the data acquisition device 1010 may be as shown in any embodiment of step S11 above.

[0164] The data processor 1020 is capable of processing detection data through a machine learning model to determine diagnostic results. The machine learning model is generated based on detection samples from oral medical diagnosis and treatment. In some embodiments, the data processor 1020 is capable of performing the method in any of the embodiments of step S12 above.

[0165] The electronic device in the above embodiments of this disclosure can collect the detection data generated by the examinee during the lower oral cavity treatment process, process the detection data using a machine learning model, identify the oral health status of the examinee or the occlusal contact status of the restoration, improve diagnostic efficiency, reduce the dependence of diagnostic operations on the doctor's subspecialty, experience and human operation, and improve the accuracy and comprehensiveness of diagnostic results.

[0166] Using the electronic devices described in the above embodiments, patients can be given timely and comprehensive preventative treatment advice, optimizing the treatment process, shortening treatment time, and improving treatment efficiency. In some embodiments, the diagnostic results for the presence of various common oral diseases or abnormal occlusal contact of prostheses can be accurate to the specific location, number, and severity of the disease.

[0167] In some embodiments, the data acquisition device 1010 may include a laser light source, a laser detector, and a laser scanning system, capable of acquiring high-resolution optical images of oral soft and hard tissues. The data processor 1020 employs spectral analysis algorithms to analyze the scattering and absorption characteristics of light, thereby diagnosing the health status of the tissues.

[0168] In some embodiments, the data acquisition device 1010 may include an electromagnetic wave source and an electromagnetic wave receiver. The data processor 1020 calculates and analyzes parameters such as tissue conductivity and impedance to perform a health assessment of oral tissue structures, such as the degree of alveolar bone resorption and the degree of implant osseointegration.

[0169] In some embodiments, the data acquisition device 1010 may include a sound sensor. After the data processor 1020 converts the acoustic signal into a spectrogram, it uses a self-supervised neural network to extract and train dual feature information in the time and frequency domains.

[0170] In some embodiments, the data acquisition device 1010 may include a terahertz wave source and a terahertz wave detector. The data processor 1020 assesses the physical properties of oral tissues through the propagation characteristics of terahertz waves, and can even detect and identify potential lesions inside the oral cavity, such as lichen planus and squamous cell carcinoma, from a pathological perspective.

[0171] Based on the electronic devices in the above embodiments, data processing can be performed using one or more of optical, electromagnetic wave, and terahertz wave detection technologies, thereby reducing the burden of early model training. This improves diagnostic accuracy and early diagnosis rate, enables precise and personalized medicine, enhances patient experience and treatment outcomes, reduces trauma, and improves system deployment efficiency.

[0172] In some embodiments, the electronic device disclosed herein can be used by patients at the beginning of their medical visit to assist clinicians in conducting a comprehensive examination of the oral and maxillofacial system, thereby achieving early detection and early intervention and treatment, and thus improving the oral health level of the entire population.

[0173] In some embodiments, the data acquisition device 1010 can be connected to the data processor 1020 via wired or wireless means, so that the acquired detection data can reach the data processor in a timely manner to obtain diagnostic results, thereby improving processing efficiency. In some embodiments, the data processor 1020 can be centrally located on the server side. The detection data is transmitted to the server for processing via a network, and the server sends the processing results to the client bound to the data acquisition device via the network, where the diagnostic results are displayed. Such electronic devices facilitate the effective use of server resources, improve computing efficiency, and reduce the cost of purchasing equipment for clients (e.g., hospitals, doctors), which is conducive to the promotion and application of the technology.

[0174] In some embodiments, the data acquisition device 1010 and the data processor 1020 are housed in the same housing. The data processor 1020 is a computing chip that can process sound data using a machine learning model to determine diagnostic results, thereby reducing the size of the device and improving the ease of use of the electronic device.

[0175] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more flowchart illustrations and / or one or more block diagrams.

[0176] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more block diagrams.

[0177] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.

[0178] This concludes the detailed description of the present disclosure. To avoid obscuring the concept of the disclosure, some details known in the art have not been described. Those skilled in the art will fully understand how to implement the technical solutions disclosed herein based on the above description.

[0179] The methods and apparatus of this disclosure may be implemented in many ways. For example, they may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order of steps for the methods is for illustrative purposes only, and the steps of the methods of this disclosure are not limited to the order specifically described above unless otherwise specifically stated. Furthermore, in some embodiments, this disclosure may also be implemented as a program recorded on a recording medium, the program including machine-readable instructions for implementing the methods according to this disclosure. Thus, this disclosure also covers recording media storing programs for performing the methods according to this disclosure.

[0180] It should be noted that the terms "first," "second," etc., used in the specification, claims, and drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0181] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure and not to limit them; although this disclosure has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications can still be made to the specific implementation of this disclosure or equivalent substitutions can be made to some technical features without departing from the spirit of the technical solutions of this disclosure, and all such modifications and substitutions should be covered within the scope of the technical solutions claimed in this disclosure.

Claims

1. A model training method, comprising: Acquire test samples in oral medical diagnosis and treatment, the test samples including test data and labels; Enhance the detection samples to obtain training samples; The training samples are processed by a self-supervised learning model until training is complete, wherein the parameters of the self-supervised learning model are adjusted according to the output diagnostic results and the labels.

2. The model training method according to claim 1, wherein, The method meets at least one of the following criteria: The diagnostic results include oral health status, and the test samples include test samples in a healthy oral state and test samples in at least one oral disease state; The diagnostic results include the condition of the prosthesis, and the test samples include test samples of the prosthesis in different conditions. The condition of the prosthesis includes at least one of the following: the occlusal contact condition of the prosthesis, the degree of osseointegration of the implant, or the condition of the internal preparation.

3. The model training method according to claim 2, wherein, The oral health status includes being healthy, or the diagnosis of oral diseases, including temporomandibular joint disorders, dental diseases, malocclusion, periodontal diseases, prosthesis problems, and oral and maxillofacial tumors.

4. The model training method according to claim 3, wherein, The diagnostic results also include at least one of the following: the location, number, and severity of the disease.

5. The model training method according to any one of claims 1 to 4, wherein The detection data includes one or more of the following: optical data, acoustic data, terahertz data, or electromagnetic wave data.

6. The model training method according to claim 5, wherein at least one of the following is met: The optical data includes a three-dimensional image reconstructed based on differences in reflection intensity after emitting a predetermined type of light onto the surface of the soft and hard tissues of the oral cavity and receiving the optical signals through a sensor. The terahertz wave data includes transmission and reflection data obtained by emitting electromagnetic waves into the soft and hard tissues of the oral cavity; The electromagnetic wave data includes X-ray absorption data of different parts of the oral tissue obtained by emitting electromagnetic waves into the soft and hard tissues of the oral cavity. The sound data includes data generated by active sound sources of the patient and data generated by passive sound sources, wherein The data generated by the active sound source includes sound data generated during mandibular movement, and the data generated by the passive sound source includes sound data generated when the diagnostic tool comes into contact with the internal structures of the oral cavity. 7.The model training method of claim 5 or 6, wherein, The enhancement of the detection samples and the acquisition of training samples include: In the case that the detection data includes the sound data, The sound data is processed by a filter to obtain a first spectrogram sample; The first spectrogram sample is processed by at least one of waveform shifting, harmonic distortion, or resampling to obtain a second spectrogram sample, wherein the training sample includes the first spectrogram sample and the second spectrogram sample. 8.The model training method of claim 7, wherein, The label corresponding to the second spectrogram sample is the same as that of the corresponding first spectrogram sample.

9. The model training method according to claim 7, 8 or 9, wherein, The step of processing the sound data through a filter to obtain the first spectrogram sample includes: The sound data is processed by a Mel filter to obtain a Mel spectrogram as the first spectrogram sample. 10.The model training method of claim 7, 8, or 9, wherein, The step of processing the first spectrogram sample by at least one of waveform shifting, harmonic distortion, or resampling to obtain the second spectrogram sample includes: Based on the first spectrogram sample, extract time-domain features and frequency-domain features; Time-frequency fusion features are obtained based on the time-domain features and the frequency-domain features; Extract local features from the time-frequency fusion features, and obtain dynamic position codes based on the time-frequency fusion features; Based on the local features and the dynamic position encoding, context-aware features are extracted using a self-attention mechanism. The second spectrogram sample is obtained by fusing the context-aware features and the time-frequency fusion features. 11.The model training method of any of claims 1-10, wherein, The self-supervised learning model is the Vision Transformer model.

12. A method for processing oral diagnosis and treatment data, comprising: Obtain test data from target patients during oral diagnosis and treatment; Based on the detection data, a diagnostic result is determined using a machine learning model, wherein the machine learning model is trained and generated by the method described in any one of claims 1 to 11.

13. The dental procedure data processing method of claim 12, further comprising: When the detection data includes sound data, the sound data of the target patient during oral diagnosis and treatment is converted into a sound spectrogram; The machine learning model determines the diagnostic result by processing the sound spectrogram.

14. An oral cavity examination method, comprising: Collect test data from target patients during oral diagnosis and treatment; Based on the detection data, a diagnostic result is determined using a machine learning model, wherein the machine learning model is generated by training on detection samples from oral medical diagnosis and treatment.

15. The oral cavity detection method of claim 14, wherein, The detection data includes one or more of the following: optical data, acoustic data, terahertz data, or electromagnetic wave data.

16. The oral cavity testing method of claim 14 or 15, wherein, The machine learning model is generated by training according to the model training method described in any one of claims 1 to 11.

17. A model training device, comprising: A sample acquisition unit is configured to acquire a detection sample, the detection sample including detection data and a label; A sample augmentation unit is configured to augment the detected samples to obtain training samples; The training unit is configured to process the training samples through a self-supervised learning model until training is complete, wherein the parameters of the self-supervised learning model are adjusted based on the output diagnostic results and the labels.

18. A dental diagnosis and treatment data processing device, comprising: The data acquisition unit is configured to acquire test data of the target patient during oral diagnosis and treatment; A state determination unit is configured to determine a diagnostic result based on the detection data and a machine learning model, wherein the machine learning model is generated by training the model training method according to any one of claims 1 to 11.

19. A data processing apparatus, comprising: Memory; as well as A processor coupled to the memory, the processor being configured to perform the method as described in any one of claims 1 to 13 based on instructions stored in the memory.

20. A non-transitory computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the method of any one of claims 1 to 13.

21. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the method of any one of claims 1 to 13.

22. An electronic device, comprising: The data acquisition device is configured to collect test data of the target patient during oral diagnosis and treatment; A data processor is configured to determine a diagnostic result based on the detection data and a machine learning model, wherein the machine learning model is generated by training on the detection samples.

23. The electronic device of claim 22, wherein, The data acquisition device includes at least one of the following combinations: Laser source, laser detector and laser scanning system; Electromagnetic wave source and electromagnetic wave receiver; Sound sensor; or Terahertz wave source and terahertz wave detector.

24. A computer program for causing a processor to perform the method according to any one of claims 1-13.