Construction and application of music score training database
By extracting features from music materials using machine learning and knowledge graph technologies, and combining these features with reference features from professional musicians, the hierarchical labels of the sheet music training database are optimized. User feedback is then used to make adjustments, which solves the problem of inaccurate hierarchical classification in the sheet music training database and achieves personalized and efficient sheet music training results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- EDTECH PLUS PTE LTD
- Filing Date
- 2022-06-28
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, the construction and application of music score training databases lack effective hierarchical labeling and optimization methods, resulting in insufficiently accurate and personalized training effects.
By extracting features from music materials using machine learning and knowledge graph technologies, combining these features with reference features from professional musicians, optimizing the hierarchical label settings, and further adjusting the score training database using user feedback.
It enables precise grading and personalized training of the music score training database, improving user training effectiveness and the adaptability of the music score database.
Smart Images

Figure CN117349257B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of databases, and in particular relates to the construction and application of a music score training database. Background Technology
[0002] Instrumental performance is a comprehensive activity that combines logical and visual thinking, mental and physical exertion, and technical skill with artistry. In the training, practice, and learning of instrumental performance, it is often necessary to gradually learn, master, and even improve instrumental performance skills based on appropriate musical materials, especially sheet music. In particular, building a sheet music training database is essential for optimizing sheet music training, practice, and even learning.
[0003] The rapid development of modern science and technology, especially the widespread application of computer technology and database technology, has provided advanced means for the construction of databases for instrumental performance and opened up broad prospects.
[0004] Unless otherwise stated, no method described in this section should be assumed to be prior art simply because it is included in this section. Similarly, unless otherwise stated, no problem recognized with respect to one or more methods should be assumed to be recognized in any prior art based on this section. Summary of the Invention
[0005] This disclosure proposes an improved method for constructing and applying a music score training database. Specifically, this disclosure constructs an improved music score training database by utilizing music-related features to obtain hierarchical labels for music data and by adjusting the settings of these labels through verification. Additionally, this disclosure further proposes applications of the music score training data, wherein the constructed music score training database can be further optimized based on user feedback in its application.
[0006] One aspect of this disclosure relates to a method for constructing a music score training database, wherein the music score training database has at least one level and a corresponding level label for each level. The method includes the following steps: for each level, obtaining a first label (L1) generated by machine learning based on a pre-set music feature set for the music score training database to be constructed; obtaining a second label (L1H) generated based on the pre-set music feature set for the music score training database to be constructed, and at least one second sub-label (L2H) contained in the second label (L1H), wherein each second label and sub-label specifies a combination of one or more music features in the music feature set and their specific value range; generating candidate music samples according to the music feature combination and music feature value range specified by the second sub-label (L2H); verifying at least one of the first label (L1) and the second label (L1H) using the generated candidate music samples, wherein the specification of at least one of the first label (L1) and the second label (L1H) is adjusted based on the verification result; and constructing the music score training database based on the verified candidate music samples and their corresponding level labels.
[0007] Another aspect of this disclosure relates to an apparatus for constructing a music score training database, wherein the music score training database has at least one level and a corresponding level label for each level. The construction apparatus includes a processing circuit configured to: for each level, acquire a first label (L1) generated by machine learning based on a pre-set music feature set for the music score training database to be constructed; acquire a second label (L1H) generated based on the pre-set music feature set for the music score training database to be constructed, and at least one second sub-label (L2H) contained in the second label (L1H), wherein each second label and sub-label specifies a combination of one or more music features in the music feature set and a specific value range thereof; generate candidate music samples according to the music feature combination and music feature value range specified by the second sub-label (L2H); verify at least one of the first label (L1) and the second label (L1H) using the generated candidate music samples, wherein the specification of at least one of the first label (L1) and the second label (L1H) is adjusted based on the verification result; and construct the music score training database based on the verified candidate music samples and their corresponding level labels.
[0008] Another aspect of this disclosure relates to a non-transitory computer-readable storage medium for storing executable instructions that, when executed, implement the methods described in embodiments of this disclosure.
[0009] Another aspect of this disclosure relates to an electronic device. According to one embodiment, the electronic device includes a processor and a storage device storing executable instructions that, when executed, implement the methods described in the embodiments of this disclosure.
[0010] Another aspect of this disclosure relates to a computer program product comprising a computer program / instructions that, when executed by a processor, implement the methods described in the embodiments of this disclosure.
[0011] Another aspect of this disclosure relates to a computer program comprising program code that, when executed by a computer, causes the computer to perform the methods described in the embodiments of this disclosure.
[0012] Another aspect of this disclosure relates to an apparatus including components for performing the methods described in the embodiments of this disclosure.
[0013] This disclosure is provided to introduce some concepts in a simplified form, which will be further described in the detailed description below. This disclosure is not intended to identify key or essential features of the claimed subject matter, nor is it intended to limit the scope of the claimed subject matter. Other aspects and advantages of this technology will become apparent from the following detailed description of embodiments and the accompanying drawings. Attached Figure Description
[0014] The above and other objects and advantages of this disclosure will be further described below with reference to specific embodiments and the accompanying drawings. In the drawings, the same or corresponding technical features or components will be represented by the same or corresponding reference numerals.
[0015] Figure 1 A schematic conceptual diagram of a system for constructing and applying a music score training database according to embodiments of the present disclosure is shown.
[0016] Figure 2 A flowchart illustrating a method for constructing a music score training database according to an embodiment of the present disclosure is shown.
[0017] Figure 3A and 3B A pre-set set of musical features for a music score training database to be constructed is shown according to an embodiment of the present disclosure. Figure 3C Exemplary labels according to embodiments of this disclosure are shown.
[0018] Figure 4 An exemplary flowchart illustrating the construction of a music score training database according to an embodiment of the present disclosure is shown.
[0019] Figure 5An exemplary flowchart illustrating the construction of a music score training database according to an embodiment of the present disclosure is shown.
[0020] Figure 6A A flowchart illustrating the application of a music score training database according to an embodiment of the present disclosure is shown.
[0021] Figure 6B An overall flowchart illustrating the construction and application of a music score training database according to embodiments of this disclosure is shown.
[0022] Figures 7A to 7C An exemplary application scenario of a music score training database according to embodiments of the present disclosure is shown.
[0023] Figure 8 An exemplary classification of sheet music libraries according to embodiments of the present disclosure is shown.
[0024] Figure 9 A block diagram of an apparatus for constructing a music score training database according to an embodiment of the present disclosure is shown.
[0025] Figure 10 A schematic diagram of a computer system in which embodiments of the present disclosure may be implemented is shown.
[0026] While the embodiments described in this disclosure may be readily modified and alternatively implemented, specific embodiments thereof are shown by way of example in the accompanying drawings and are described in detail herein. However, it should be understood that the drawings and the detailed description thereof are not intended to limit the embodiments to the specific forms disclosed, but rather are intended to cover all modifications, equivalents, and alternatives that fall within the spirit and scope of the claims. Detailed Implementation
[0027] Exemplary embodiments of this disclosure will be described below with reference to the accompanying drawings. However, it is obvious that the described embodiments are merely some, not all, of the embodiments of this disclosure. The following description of the embodiments is also illustrative in nature and is in no way intended to limit this disclosure or its application or use. It should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. For clarity and brevity, not all features of the embodiments are described in the specification. It should also be noted that, to avoid obscuring this disclosure with unnecessary detail, only processing steps and / or device structures closely related to at least the solutions according to this disclosure are shown in the drawings, while other details less relevant to this disclosure are omitted.
[0028] However, it should be understood that many implementation-specific settings must be made during the implementation of the embodiments in order to achieve the developer's specific goals, such as complying with constraints related to the device and business, and these constraints may vary depending on the implementation. Furthermore, it should be understood that while the development work may be very complex and time-consuming, such development work is merely a routine task for those skilled in the art who benefit from this disclosure.
[0029] It should be understood that the various steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect. Unless otherwise specifically stated, the relative arrangement, numerical expressions, and values of components and steps set forth in these embodiments should be interpreted as merely exemplary and do not limit the scope of this disclosure.
[0030] As used in this disclosure, the term "comprising" and its variations are open-ended terms that include at least the following elements / features but do not exclude other elements / features, i.e., "including but not limited to". Furthermore, as used in this disclosure, the term "including" and its variations are open-ended terms that include at least the following elements / features but do not exclude other elements / features, i.e., "including but not limited to". Therefore, "comprising" and "including" are synonymous. The term "based on" means "at least partially based on".
[0031] Throughout this specification, the terms "one embodiment," "some embodiments," or "embodiment" mean that a specific feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the invention. For example, the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; and the term "some embodiments" means "at least some embodiments." Furthermore, the appearance of the phrases "in one embodiment," "in some embodiments," or "in an embodiment" in various places throughout the specification does not necessarily refer to the same embodiment, but may refer to the same embodiment.
[0032] It should be noted that the concepts of "first," "second," etc., used in this disclosure are used only to distinguish different devices, modules, or units, and are not intended to define the order of functions performed by these devices, modules, or units or their interdependencies. Unless otherwise specified, the concepts of "first," "second," etc., are not intended to imply that the objects described herein must be in a given temporal, spatial, rank, or any other given order.
[0033] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0034] Building and applying an appropriate music score training database is crucial for music score training, practice, and even learning. Specifically, for music score training, practice, and even learning, there is a desire to provide an efficient system centered on the user, especially those engaged in music training or learning. In particular, there is a desire to accurately build a music score training database and provide it to users for training, practice, and even learning. Furthermore, there is a desire to obtain user feedback on the database's application, such as the results of training, practice, and even learning using the database, which can then be used to further optimize the construction of the music score training database.
[0035] Figure 1 A schematic conceptual diagram of a system for constructing and applying a music score training database according to embodiments of the present disclosure is shown, schematically illustrating the interaction between the music score training database and a user or user-end device. Specifically, in the system according to the present disclosure, a music score training database (backend) is created based on musical materials, and the generated music score training database is interacted with by the user (frontend), for example, presented to the user or client device. The user uses the music score training database, and the user's usage data is provided as feedback to the backend to optimize the music score training database. The above interaction can be performed in various suitable ways, such as through interactive software, transceivers, various types of presentation devices, etc.
[0036] According to embodiments of this disclosure, a music score training database can be constructed in particular by using technologies such as machine learning and knowledge graphs. Specifically, music-related features are extracted from music materials / samples / data (e.g., music score data, especially music score data that may have a predefined specific difficulty level) using technologies such as machine learning and knowledge graphs. This allows for the setting of hierarchical labels for the music materials / samples / data, achieving appropriate and accurate hierarchical classification of the music materials / samples / data. Here, the hierarchical classification / level specifically corresponds to the possible difficulty level of the music data during performance, thereby constructing a training database that is appropriately hierarchical.
[0037] It should be noted that machine learning techniques used to extract music-related features can include a variety of appropriate techniques, such as deep learning, image recognition, and audio recognition. Music-related features can be, for example, music-related features in a musical score, such as musical attribute features, performance-related features, etc., and can be obtained in various appropriate ways. As an example, they can be extracted from music XML format files of music materials / samples / data, for example, through deep learning. Music XML format files contain overall musical elements and attribute information. As another example, music features can be extracted from images of music materials / samples / data through image recognition, for example, by extracting sub-images related to music features and comparing them with reference images to determine the music features. As yet another example, music features can be obtained from the audio content of music materials / samples / data through audio recognition, specifically through sound signal processing.
[0038] As examples, techniques for extracting musical features from musical material may include, but are not limited to, Optical Music Recognition (OMR) techniques: Score Transformer; Music score standardization encoding: MIDI, MusicXML, Musescore; Music difficulty grading: Multilayer Neural Network (MLP), Xgboost, Support Vector Machine (SVM), Random Forest (RF); Music score feature analysis tools: NumPy, Pandas, SciPy; Graph database tools: PyKEEN, Neo4j, Nebula, etc.
[0039] According to embodiments of this disclosure, preferably, music-related features are further considered as references in the music data classification, and the classification of music data is optimized based on both such reference features and the extracted music-related features. In particular, the extracted music-related features are verified based on the reference features, thereby further improving the accuracy of the classification, such as adjusting / optimizing the level classification labels, thereby further improving the accuracy of the construction of the music score training database.
[0040] It should be noted that music-related features used for reference can be obtained in various appropriate ways. For example, music-related features can be obtained from music materials / samples / data in various appropriate ways, such as extraction from Music XML format files, image recognition, sound processing, etc., as mentioned above, these operations can be performed by various appropriate devices (e.g., reference extraction devices). Alternatively, music-related features can be derived from experience by professional musicians or experienced users through reading or extracting music materials / samples / data.
[0041] Preferably, the reference features can be refined difficulty level labels set based on experience. These can be used to adjust the level divisions, such as the composition of features within each level, or to further refine the features of each level, such as through subcategorization. This allows for further optimization of the database's features, resulting in more appropriate data grading to build a more suitable database and provide / recommend more accurate music data to users.
[0042] The following is a brief description of an exemplary implementation of constructing a music score training library using machine learning, specifically employing machine learning to obtain grade labels and associated features. Based on music scores with publicly available difficulty levels and their corresponding music score features, multi-label difficulty grading can first be performed: by statistically analyzing the publicly available difficulty levels corresponding to music scores with musical feature values, the possible difficulty level range of each musical feature value (i.e., label) can be obtained (e.g., 3 / 8 time signature only exists in music scores above a certain difficulty level; double-dotted quarter rests only exist in music scores above a certain difficulty level). The i-th feature dimension is defined as x. i There exists a musical score j with the characteristic value a and a difficulty level of g. j By statistically analyzing the difficulty levels of all musical scores that meet the conditions, the possible difficulty range f(xi=a) corresponding to the musical feature value can be obtained. The possible difficulty range f(x) corresponding to each musical feature value is known. i =a), for any musical score j, by considering multiple musical feature dimensions x i The corresponding eigenvalue frequency distribution can be used to estimate the minimum possible difficulty of the musical score. Musical section difficulty prediction: A machine learning model (such as a multi-layer neural network or decision forest) is used to learn the mapping relationship between musical features in the score and their corresponding public difficulty levels. This yields the coarsest-grained first-layer musical section difficulty prediction model, representing relatively coarse musical features corresponding to a specific difficulty level, thus obtaining a label for that level. Within each public difficulty level, a second-layer classification model for predicting sub-level difficulty is trained using crowdsourcing of consensus among piano teaching experts on the subdivision of musical section difficulty. This yields more refined musical features corresponding to that difficulty level, thus obtaining a label for that second level. This process allows for the acquisition of musical features corresponding to various difficulty levels in the score. Furthermore, adjustments can be made during application, particularly in different application modes (such as sight-reading and ear training). Accumulated user training data (such as error rate, final pass rate, and average number of practice sessions required for first pass) allows for adaptive, refined difficulty prediction and ranking under different application modes.
[0043] According to embodiments of this disclosure, the constructed music score training database is provided to users in an appropriate manner. In some embodiments, the constructed music score training database can serve as a music sample training library for users to train with. For example, corresponding samples can be provided to users level by level for training, practice, or even learning. The data in the music score training database can be provided to users in various appropriate ways, such as as games. For example, the music score training database can be provided to users through various appropriate devices, such as display devices, acoustic devices, etc. In some implementations, the data in the music score training database can be converted into game-based exercises through a game converter. The system automatically judges and corrects the child's practice, promotes self-directed learning, makes the learning content visible, and makes after-class practice traceable.
[0044] In another example, the constructed music score training database can serve as a single-song training library. Specifically, a single song can be graded, for example, by difficulty level, thus generating various graded samples from that single song. These single songs can then form a single-song training library, which can be provided to users for training. The single-song training library can utilize melody waveform extraction and comparison recognition tools to identify and crawl training materials (videos, audio) related to a single song. Based on the results of extensive user material selection and usage, the system's recommendation function can be optimized, continuously improving the accuracy and conciseness of the single-song explanations. Furthermore, the single-song training library can extract relevant features of the single song to form specialized training for that song.
[0045] According to embodiments of this disclosure, the music score training database can be further adjusted based on user training, practice, and even learning results as feedback. In some embodiments, the sorting and arrangement of data / materials at each level in the music score training database can be optimized based on user feedback, thereby helping to provide users with a more appropriate and accurate music score training database. In other embodiments, the music score training database can also be optimized based on user feedback, particularly optimizing and updating the level labels and associated features of the music score training database. In one example, when the content in the music sample training library is presented to the user for training through interactive games, these interactive games can record, judge, and evaluate user feedback. Through a large number of user interaction records, the difficulty ranking of the training content in the training library can be optimized, and even the difficulty levels and associated features in the training library can be optimized.
[0046] Therefore, this disclosure proposes an improved method for constructing a music score training database. Specifically, this disclosure utilizes information acquisition techniques such as machine learning, knowledge graphs, and sound separation to extract musical features from musical materials, improving the efficiency and accuracy of feature acquisition and thus accurately setting the grade labels for the musical materials. Furthermore, this disclosure further adjusts and optimizes the machine learning grade labels using music-related features that can be used as a reference, thereby further improving the accuracy of machine learning grade labeling and enabling the construction of a more accurate and appropriate database. Moreover, this disclosure further utilizes user feedback on the application of the database to further optimize the construction and application of the music score training database.
[0047] The construction and application of the music score training database according to this disclosure will be described below with reference to the accompanying drawings. However, it should be noted that the concept of database construction and application of this disclosure can be equally applied to other types of music training databases, and in particular to the construction and application of databases of other types of music materials for user training and learning.
[0048] Figure 2 A flowchart illustrating a method for constructing a music score training database according to an embodiment of this disclosure is shown. The music score training database has at least one level, and each level has a corresponding level label. Here, the level may correspond to the classification of musical data, particularly the difficulty classification. The difficulty classification may correspond to a combination of music-related features contained in the musical data. Different difficulty classifications may correspond to different feature combinations, which will be described in detail below.
[0049] The method 200 for constructing the music score training database disclosed herein can be performed for each data level, and for each data level, it can include the following steps:
[0050] In step S201, the first label (L1) generated by machine learning is obtained;
[0051] In step S202, a second label (L1H) generated based on a music feature set pre-set for the music score training database to be constructed and at least one second sub-label (L2H) contained in the second label (L1H) are obtained, wherein each second label and sub-label specifies a combination of one or more music features in the music feature set and its specific value range.
[0052] In step S203, candidate music samples are generated based on the music feature combination and music feature value range specified by the second sub-label (L2H);
[0053] In step S204, at least one of the first label (L1) and the second label (L1H) is verified using the generated candidate music samples, wherein the definition of at least one of the first label (L1) and the second label (L1H) can be adjusted based on the verification result;
[0054] In step S205, a music score training database is constructed based on the verified candidate music samples and their corresponding labels.
[0055] The operations performed in the above method will be described in further detail below.
[0056] According to some embodiments of this disclosure, the first label can be generated in various suitable ways. In some implementations, the grade labels can be generated from a provided music score training sample library through machine learning, especially deep learning. In particular, machine learning can be used to obtain the mapping relationship between the musical features of the music score and its corresponding public difficulty level, thereby obtaining the labels for each difficulty level and the associated features. As an example, through deep machine learning, the music score and / or audio are directly used as input, and an encoding method for the input data is learned through supervised or self-supervised methods to obtain a fixed-length feature code (e.g., extracting 1024-dimensional features using TripNet or Transformer). Difficulty classification is then performed based on this feature code, thereby enabling the extraction of grade labels and corresponding feature combinations from the music score. In other examples, a deep representation of the music score can be learned from audio files or standardized music score encoding files to compare music score similarity and analyze its correlation with music score difficulty. The distance between music score features of different difficulty levels can be expanded through metric learning and other methods to achieve better difficulty level classification.
[0057] In another implementation, the sheet music or samples can be learned based on a pre-set feature set to obtain labels and corresponding feature combinations. Specifically, the first label can be generated by referring to a pre-set music feature set used to train the sheet music database to be constructed. As an example, through feature engineering, given a pre-set feature set, multiple feature dimensions are combined as input, and difficulty level is output to learn the correspondence between them.
[0058] According to some embodiments of this disclosure, a pre-set music feature set for the music score training database to be constructed includes music features associated with the music score training database and the possible value range of each music feature. Specifically, in some embodiments, the music features associated with the music score training database to be constructed may be music performance features that depend on the type or application attributes of the music score training database to be constructed, which can be obtained by extracting music features from electronic music scores.
[0059] In some embodiments, the pre-defined music feature set can be multidimensional, where each dimension can be considered to correspond to a music feature. In some embodiments, the music feature can be any suitable feature. In particular, it can be extracted from existing or widely available music data based on basic elements related to the score (e.g., especially musical attribute elements of the score, performance-related features, etc.). As an example, features can be extracted based on the key, meter, rhythmic pattern, hand position, musical notation, harmony, playing style, intervals, and melody / musical pattern classification as basic elements related to the score. The music feature can be, for example, at least one of the features including interval features, range features, time signature features, rhythmic features, etc., all of which can serve as feature dimensions in the music feature set.
[0060] According to some embodiments of this disclosure, the musical features can be obtained by extracting musical features from musical data. In some embodiments, musical features can be extracted from musical data using techniques such as computer learning, neural networks, and machine learning, as described above in the feature extraction techniques. Musical data can be data obtained from appropriate sources, such as electronic sheet music in various appropriate formats, sheet music images, music, etc. The musical data is not particularly limited, as long as music-related features can be extracted from it. Musical data can also be data submitted by users of an application database, such as usage / learning feedback, etc. In one example, musical features can be obtained by collecting features from sheet music in a known music database, and then constructing the value range of the feature based on the relevant values of each feature. For example, each feature contains at least one value, and all possible values of each feature constitute the value range of the feature.
[0061] The following describes an exemplary method for obtaining a pre-set music feature set. In one example, firstly, a standardized music score library (musical score images and electronic musical score encoding) can be constructed in various ways. For example, one method is to scan physical musical scores from music textbooks or batch crawl publicly available musical score images, and then generate MIDI, MusicXML, or Musescore format musical score encoding files through optical music score recognition. Another method is to batch obtain standardized musical score encoding files from public websites such as Musescore and compile them to generate corresponding musical score images. The accuracy of the musical score files is ensured through multi-source cross-validation and manual verification. Then, musical score features (such as tonality, meter, rhythmic pattern, hand position, musical notation, harmony, playing technique, intervals, melody lines, etc.) are extracted from the standardized musical score library: the distribution information of each feature dimension is statistically analyzed from the standardized MIDI, MusicXML, or Musescore format musical score encoding files, and the multi-dimensional feature frequencies are used as the features of the musical score. Thus, a musical score feature set can be obtained as a pre-set feature set.
[0062] In another example, the pre-defined music feature set can also be obtained from a specific music score encyclopedia infographic. For instance, the music score encyclopedia infographic can include score name, composer, period, genre, style, mode / form, ABRSM rating, CM / MPFA rating, Shanghai Conservatory of Music rating, etc. Furthermore, it can be constructed by using web crawlers to obtain massive amounts of music score encyclopedia information and corresponding difficulty levels from public data sources as score attribute values. With scores as nodes and the correspondence between scores and attribute values as edges, the music score encyclopedia infographic can be built. Therefore, various features and their corresponding value ranges can be extracted from this music score encyclopedia infographic as a pre-defined feature set.
[0063] As an example, the pre-defined set of feature categories may include, but is not limited to, at least one of the following features, and these features and the possible values for each feature may be as follows:
[0064] The range is 88 whole tones, which can refer to the audio parameters corresponding to the 88 piano keys.
[0065] Interval: An interval refers to the distance between two notes, including but not limited to a second, a third, a unison and a third, a fourth, a fifth, a fifth (excluding a fourth), a sixth, a seventh, and an octave. It should be noted that intervals can be derived using various appropriate methods. Specifically, one method uses middle C as 440Hz, multiplying by 1.059463 for each semitone upward. For example, the frequency of C#, adjacent to middle C, is 440 * 1.059463 = 466.16372, recorded as 466.16Hz on the webpage. This is the twelve-tone equal temperament calculation, which is the standard pitch for a piano.
[0066] Slot numbers: 2 / 4, 3 / 4, 4 / 4, 3 / 8, 6 / 8, 9 / 8, 12 / 8, 2 / 2, 5 / 4, 7 / 4, 5 / 8, 7 / 8, 16 / 3
[0067] Note and rest duration
[0068] Rhythmic pattern: Rhythmic pattern essentially refers to a fixed sequence of combinations of different note and rest characteristics.
[0069] Tonality: Specifically, each key consists of 7 notes, and the differences between different keys lie in the spacing between these 7 notes and the starting tonic. As examples, keys may include, but are not limited to, C major, G major, F major, A minor, d minor, D major, E minor, G minor, A major, B minor, Bb major, Eb major, F# minor, C minor, E major, C# minor, F minor, Ab major, B major, G# minor, Db major, Bb minor, Gb major, and Eb minor.
[0070] Fingering and hand positions: This involves fingering, left and right hands, pitch and range, and may include, but is not limited to, single-hand five-finger positions, two-hand five-finger positions for alternation, two-hand five-finger positions for ensemble, single-hand five-finger positions for alternation (hand-changing position), two-hand five-finger positions for ensemble (hand-changing position), finger crossing (finger turning), finger expansion, finger contraction, finger changing on the same note, hand position changes (occurring only in one hand), and hand position changes (occurring simultaneously in both hands);
[0071] Accompaniment type: In essence, it is a combination of different pitches, intervals and tonal characteristics, such as, but not limited to, short arpeggios, broken chords, Alberti bass, and chords;
[0072] Musical type: In essence, it is a combination of different pitch intervals and tonal characteristics, such as, but not limited to, scales, arpeggios, and chromatic scales;
[0073] Performance expression markings
[0074] Performance markings
[0075] Temporary rise / fall mark
[0076] It should be noted that the above features are merely exemplary and not restrictive. The preset set of musical features may also include other appropriate musical features. Figure 3A and 3B Examples of music feature sets are shown below.
[0077] According to some embodiments of this disclosure, the first label (L1) may be generated by extracting music features from one or more music samples with corresponding levels contained in the music score training sample library (also known as the music candidate database) through machine learning, in order to determine the combination of music features corresponding to each level and the range of music feature values to generate the first label (L1).
[0078] In some examples, feature combinations and labels can be directly obtained from samples using machine learning (e.g., deep learning), as described above. In other examples, a first label can be generated based on the music features corresponding to the corresponding level in a pre-defined music feature set. Specifically, for each level, music features can be extracted from the corresponding data in the provided music score training sample library, and the features corresponding to that level and their corresponding values will be set based on the pre-defined feature set mentioned above. Specifically, at least some features contained in the pre-defined feature set mentioned above, and the corresponding values of those features, are extracted from the corresponding database, thereby obtaining the features corresponding to that level and the range of feature values, where the range of feature values can be considered to be included within the range of a pre-defined feature category set.
[0079] In some embodiments of this disclosure, the "features" or "feature combinations" used to define labels essentially refer to combinations of features extracted from or defined from the material, as well as the values (e.g., ranges) of each feature in the combination, specifically including which types of features are included and the values of each type of feature. For example, if the features include measures and time signatures, then when defining a label, it is also necessary to indicate the number of measures, the value of the time signature, etc.
[0080] In some embodiments of this disclosure, "level" here may specifically refer to difficulty level / grade. "Difficulty level" can refer to a pre-defined or pre-set difficulty level of musical materials, particularly the difficulty level specified in existing grading / classification materials, such as each level in the ABRSM (Associated Board of the Royal Schools of Music) exams, or a subjectively perceived difficulty level. Note that the ABRSM exams are just one example; the same method can be applied to other grading systems. Furthermore, the system can be made more complex by comparing the difficulty identification of various grading systems, such as further refining it into the ABRSM system, the Trinity College London system, the China Conservatory of Music system, the Shanghai Conservatory of Music system, etc. The advantages of this complexity are: establishing a standard modular grading database that can determine which level of which grading system corresponds to the difficulty attribute of each future training material; furthermore, it can serve as a student ability assessment model, allowing users to interact with the system to determine what standards students can achieve; and it can become a simulation grading system for various grading systems, helping to determine students' pass rates.
[0081] As an example, difficulty levels can be divided into several levels, such as from the lowest level to the highest level, such as from level 0 to level N. The lower the level, the fewer features it may contain, and / or the narrower the range of values for the features.
[0082] The following are some examples of difficulty levels, which can include preparatory level and level one. The characteristics of each difficulty level and the corresponding value range of the characteristics are as follows:
[0083] Preparatory level
[0084] Pitch range Medium CE, Medium CG, Medium CA, Medium CF, Medium AE, Medium FG, Low CG, Low C-High C
[0085] Interval Within a second interval (stepwise); within a first and third interval (leap); within a third interval (stepwise and leap); within a fourth interval; within a fifth interval (excluding the fourth); within a fifth interval
[0086] Time signature 2 / 4 3 / 4 4 / 4
[0087] Note and rest values
[0088] Key C major, G major, F major
[0089] Hand position Single-hand five-finger position; two-hand alternating five-finger position
[0090] Performance expression markings
[0091] Performance markings
[0092] Primary level
[0093] Pitch range Medium AE, Medium CG, Medium CF, Low CG, Low C-High C, High CG, Low CF, Low F-High G, 5C
[0094] Interval Within a second interval (stepwise); within a first and third interval (leap); within a third interval (stepwise and leap); within a fourth interval; within a fifth interval (excluding the fourth); within a fifth interval
[0095] Time signature 2 / 4 3 / 4 4 / 4
[0096] Note and rest values
[0097] Key C major, G major, F major, A minor, D minor
[0098] Finger position Single-hand five-finger position; two-hand alternating five-finger position
[0099] Performance expression signs
[0100] Performance technique signs
[0101] Temporary sharp or flat signs
[0102] It should be noted that the above examples are merely illustrative, and the difficulty levels can also include other levels, such as level two to the highest level, such as level eight. In this case, the highest level (level eight) can correspond to the pre-set set of feature categories described above.
[0103] In some embodiments of this disclosure, "tags" are generated for "difficulty levels" and correspond to "features and feature combinations". For example, a tag is generated for a grade or difficulty level in an ABRSM exam. This tag indicates the musical features that the level should include, as well as the values of various features. If there are multiple levels in the ABRSM exam, then multiple tags are generated, with one tag per level. A "tag" can be considered simply an identifier to indicate one or more feature combinations corresponding to that tag, as well as the values of each feature in the combination. To some extent, it can be understood as a rule / regulation / standard that specifies one or more feature combinations corresponding to that tag, as well as the values of each feature in the combination.
[0104] In some embodiments of this disclosure, the labels can be represented in any suitable way, such as by numbers, characters, symbols, etc., corresponding to each level. For example, levels 0 to 8 can be represented by numbers 0 to 8, letters A to H, etc. The form of the labels is not particularly limited, as long as they can be distinguished from each other to indicate different levels.
[0105] According to some embodiments of this disclosure, at least one of the second tag (L1H) and at least one second sub-tag (L2H) is generated by selecting a specific combination of music features and a corresponding range of feature values based on the music features corresponding to the corresponding level in a pre-set music feature set. Specifically, each second tag and sub-tag specifies a combination of one or more music features in the music feature set and its specific range of values.
[0106] In some embodiments of this disclosure, the second label includes at least one second sub-label, and the feature corresponding to each second sub-label is included within all the features of the second label, and / or the value range of each feature corresponding to each second sub-label is within the value range of the feature corresponding to the second label.
[0107] In some embodiments of this disclosure, a second label and at least one second sub-label can be selected. For example, for a certain level, music-related technical personnel or experienced music users can use their experience to analyze the data of that level in the music score training sample library to label and obtain at least some features included in the feature set preset above, and the corresponding value range of the features; or a benchmark feature extraction device can extract features from the music data of that level, for example, through machine learning, deep learning, etc., to obtain at least some features included in the feature set preset above, and the corresponding value range of the features, as the second label and at least one second sub-label.
[0108] As an example, music-related technical personnel or experienced music users, including piano education experts, can estimate the difficulty of each musical passage and use data analysis to derive the musical characteristics and the possible difficulty range corresponding to each characteristic value. Of course, before accumulating sufficient labeled data, a standard operating procedure (SOP) for difficulty labeling can be summarized during the labeling process. This includes designing the priority consideration relationships of different musical feature dimensions and decision tree conditions for difficulty classification.
[0109] In some embodiments of this disclosure, the generation of the first tag and the generation of the second tag and at least one second sub-tag described above can be performed independently of each other, can be performed in various suitable orders, or can be performed in parallel.
[0110] The following will continue to use the sight-reading library as an example to illustrate tags (corresponding to the first-level tags) and sub-tags (corresponding to the second-level tags):
[0111] The dimensions of tags include range, interval, rhythmic pattern, tonality, etc.
[0112] The sub-tags have the same tag dimensions as the tags, including range, interval, rhythm pattern, tonality, etc.
[0113] However, the tag dimensions of tags and sub-tags have different granularities, and the granularity of tag dimensions is larger than or can cover the granularity of sub-tag dimensions. For example, tags can be considered to correspond to grades, with a broad range of dimensions, while sub-tags correspond to units further subdivided below the grade; typically, 10-20 units are needed to form a grade. The following explanation still uses a sight-reading library as an example:
[0114] Assuming one of the tag dimensions in the tag layer has a pitch range (low C to high C), then the sub-tag layer in this corresponding pitch range dimension will be subdivided into...
[0115] Middle CDE
[0116] Zhongyin CG
[0117] Alto CA
[0118] Midrange CF
[0119] Alto AE
[0120] Midrange FG
[0121] Bass CG,
[0122] Low C to high C (This is the maximum range, equivalent to the range definition of the tag layer in this dimension)
[0123] Figure 3CAlso illustrated are manually set labels, namely a second label and at least one sub-label thereof, according to embodiments of the present disclosure.
[0124] According to embodiments of this disclosure, a second label and / or a second sub-label can be used as a reference or benchmark to verify a first label, thereby improving the first label and making the extraction of features corresponding to the first label more accurate. In some embodiments, the second sub-label is specifically used to verify both the first and second labels. Specifically, on the one hand, the second sub-label can be used as a benchmark to verify and / or improve the first label, as described above; on the other hand, the second sub-label can be used to improve the second label, thereby optimizing the acquisition of the second label, improving the accuracy of the second label, and achieving more accurate verification when the second label is used as a benchmark to verify the first label.
[0125] It should be noted that optimizing or improving the level labels (e.g., the aforementioned first and / or second labels) is essential for achieving training data cleaning, hard sample mining, feature engineering, and data annotation. On one hand, the standards for grading the difficulty of musical scores are not static, which can lead to noisy data. Label optimization or improvement can remove the influence of this noisy data. On the other hand, during model training, hard samples can be identified, such as samples with high training loss, samples whose predicted values do not match the labels, or samples far from the cluster center. These identified samples can be labeled by an expert team to ensure data accuracy, and the labels can be further optimized using these identified samples. On the other hand, before model training, feature correlation analysis can be performed using correlation coefficients, Shapley values, etc., to select highly correlated features as input. This improves the model's interpretability while reducing the number of model parameters and the need for labeled data. Furthermore, the number of sheet music samples varies across different levels, and this data imbalance can lead to model bias. For levels with fewer samples, possible sheet music can be found and annotated by an expert team to supplement the training data. The labels can then be optimized or improved based on the labels corresponding to the supplementary training data, thereby increasing label accuracy. Here, weighted loss functions such as Focal loss can also be used to balance sample weights.
[0126] According to embodiments of this disclosure, various suitable methods can be employed to utilize the second sub-tag for verification. In some embodiments, a music sample is generated using the second sub-tag, thereby verifying the first tag and / or the second tag based on the generated music sample. Here, the generated music sample can be used as a reference music sample for verifying the first tag and / or the second tag.
[0127] According to some embodiments of this disclosure, music samples can be generated in various suitable ways. According to some embodiments of this disclosure, without limitation, generating candidate music samples based on a second sub-tag (L2H) includes at least one of the following:
[0128] Obtain the music fragments corresponding to each music feature that meets the second sub-label (L2H) definition, and combine the obtained music fragments to generate candidate music samples, or
[0129] Music samples that meet the music feature combinations and value ranges specified by the second sub-label (L2H) are extracted from the music score training database as candidate music samples.
[0130] In some embodiments of this disclosure, music fragments corresponding to each music feature conforming to the second sub-tag (L2H) specification can be obtained from an existing music resource library and combined. This can be achieved through various suitable methods. In some examples, music samples, i.e., electronic music melodies, can be synthesized using the music fragments corresponding to each music feature through electronic music synthesis methods. In particular, the music fragments corresponding to the music features can be in various suitable forms, such as signals containing information related to the music features, such as sounds corresponding to range, intervals, and beats, or music data saved in a specific format. The form of the music fragment is not limited as long as it can reflect its corresponding music feature. In particular, in some embodiments of this disclosure, the obtained features and their combinations are preferably randomly arranged and combined to automatically generate training materials.
[0131] In other embodiments of this disclosure, musical features can be obtained from known or pre-defined feature libraries or sets, such as the pre-defined feature libraries mentioned above. In some embodiments, musical samples that conform to the musical feature combinations and value ranges specified by the second sub-label (L2H) can be extracted from the music score training sample library as candidate musical samples. The music score training sample library here can be any known sample library, such as various music score libraries. "Existing music score library" refers to unclassified music scores uploaded manually, including but not limited to music scores from previous years provided to machine learning grading systems; music scores uploaded by users (teachers or students) to music score converters; music scores in open-source digital music score libraries (such as the Giant MIDI music score library); any music scores that can be crawled on the Internet, etc. Of course, the music score training database here can also be the aforementioned sample library used to generate the first label.
[0132] As an example, the extraction here can be performed using machine learning methods to extract features from the aforementioned sample library and music score library, and then compare these features with the features and feature range of the second sub-label. If the comparison is consistent, the data is used as the music sample for verification. That is, a feature matching method is used to find candidate music samples corresponding to the second sub-label from the music score library for verification.
[0133] In some embodiments, candidate music samples can be formatted after acquisition to facilitate verification. For example, after generating music samples as described above, the format of the generated music samples can be adjusted according to the format requirements of the database application scenario, modular training, or application.
[0134] According to some embodiments of this disclosure, verification based on candidate music samples generated according to a second sub-tag (L2H) may include verifying whether the candidate music samples conform to the musical feature combination and feature value range specified by the first tag (L1). In some embodiments, verification may be performed in various suitable manner, particularly by comparing the musical features in the generated candidate music samples with the features corresponding to the first tag.
[0135] According to some embodiments of this disclosure, verifying whether a candidate music sample generated based on a second sub-label (L2H) conforms to the requirements of a first label (L1) includes: extracting music features from the candidate music sample through machine learning; and comparing the extracted music features with the music feature combination and its value range specified by the first-level label to verify whether the candidate music sample conforms to the requirements of the first-level label.
[0136] In some embodiments of this disclosure, a candidate music sample is considered to conform to the first label if the type and value of the extracted music features are included within the range of music feature combinations and their values specified by the first label. Conversely, a candidate music sample is considered to not conform to the first label if at least one of the types and values of the extracted music features is not included within the range of music feature combinations and their values specified by the first label. For example, if the types of features extracted from the music sample are fewer than or equal to those specified by the label, and are all covered within the types specified by the label, and the value of each feature type is also included within the value range of the same feature specified by the label, then it can be considered to conform to the first label. Conversely, if at least one type of feature extracted from the music sample is different from the type specified by the label, or even if the types of features extracted from the music sample are covered by the types specified by the label, but the value range of at least one type is different from the value range of the same feature specified by the label, even if there is partial overlap, it is considered to be different, and therefore it is considered not to conform to the type specified by the label.
[0137] According to embodiments of this disclosure, the labels are further adjusted or optimized based on the verification results, especially when the verification results indicate differences or conflicts between the labels, thereby improving accuracy.
[0138] Specifically, in some embodiments, adjusting the definition of at least one of the first label (L1) and the second label (L1H) based on the verification result includes: if the candidate music sample generated according to the second sub-label (L2H) does not conform to the definition of the first label (L1), further presenting the candidate music sample to the user for verification; and adjusting the definition of at least one of the first label (L1) and the second label (L1H) according to the user's verification result. Thus, by using music professionals or others as users to verify differences or conflicts, the type or cause of this inconsistency or difference—such as the candidate music sample generated by the second sub-label (L2H) not conforming to the definition of the first label (L1)—can be analyzed, and the label or feature can be adjusted accordingly based on the analyzed type or cause.
[0139] As an example, inconsistencies or discrepancies can arise from several reasons. The first reason is that the actual difficulty level of the sheet music deviates from the prescribed difficulty level. For instance, a sheet music that is supposed to be of lower difficulty may actually be more difficult, such as "a Grade 5 piece might be more difficult than a Grade 6 piece the following year." In this case, the difficulty level labels of the sheet music need to be optimized. The second reason is that the reference labels or manual labels are not clear enough, leading to recognition errors. For example, the boundaries of the difficulty level definitions in manual labels may be unclear, overlapping, or differ from the labels generated by machine learning from the examination materials. In this case, the boundaries of the manual labels or reference labels need to be optimized and clarified. The third reason is related to the dimensions of the pre-set feature library. For example, if the feature category dimensions of the pre-set feature library are not refined enough, it may be necessary to adjust the dimensions of the feature categories.
[0140] According to some embodiments of this disclosure, when the feature category or feature dimension involved in the inconsistency or difference is still included in the previously acquired music feature set, such as the feature set previously obtained from music samples by machine learning or a pre-set music feature set, the above-described adjustment process can be performed, such as the generation or adjustment of the first label (L1) or the second label (L1H). In some embodiments, when the user verification result indicates that the candidate music sample conforms to the requirements of the first label (L1), the candidate music sample is used as a training sample to optimize the generation of the first label (L1) by machine learning, thereby using a more comprehensive sample image to obtain the first label and improving the accuracy of the generated first label. This may correspond, for example, to the first reason for the difference mentioned above. In particular, the machine learning scheme according to this disclosure can be used to optimize the first label.
[0141] In other embodiments, if the user verification result indicates that the candidate music sample does not conform to the requirements of the first label (L1), the second label (L1H) is adjusted. In some embodiments of this disclosure, adjusting the setting of the second label includes adjusting the combination of music features and / or the value range of each music feature specified by the second label with reference to the first label, so that the candidate music sample generated based on the adjusted second label conforms to the requirements of the first-level label. This may correspond, for example, to the second reason for difference mentioned above.
[0142] According to embodiments of this disclosure, additionally or alternatively, adjustments can be made to the previously acquired music feature set, such as a feature set previously obtained from music samples using machine learning or a pre-set music feature set, based on the user verification results. Specifically, if it is determined that the feature category or feature dimension involved in the inconsistency or difference is not included in the previously acquired music feature set (this could correspond to, for example, the third reason for difference mentioned above), then the accuracy of the previously acquired music feature set is considered insufficient. Therefore, the feature categories in the previously acquired music feature set can be further adjusted, for example, by adding new feature categories or further subdividing existing feature categories. Here, for example, the music samples to be verified can be used as new training samples to retrain or learn the features, as described above, thereby further optimizing the feature labels.
[0143] As an example, 20 feature categories can be pre-set as a feature set during the application recognition process. When generating the Level 1 difficulty label through machine learning, 4 feature categories from this feature set are used. When manually defining the Level 1 difficulty label, 4 feature categories from this feature set are also used. However, the 4 features generated when manually defining the Level 1 difficulty label may be different from the 4 features defined by machine learning. Therefore, in this case, the label can be adjusted or refined based on the verification results.
[0144] On the one hand, if the differences arise within the 20 features—for example, comparing a human's D feature with a machine's D feature in an inclusive relationship—then the adjustment needs to be made to the parent and child labels, optimizing the label settings and value ranges. On the other hand, if the differences arise outside the 20 features, then it's necessary to add pre-defined feature recognition dimensions and refine the definition of new feature labels.
[0145] It should be noted that the pre-set feature set can also be adjusted or optimized in other appropriate ways. In some embodiments, a deep representation of the score can be learned from the supplementary audio files or standardized score encoding files to compare score similarity and analyze its correlation with score difficulty. The distance between score features of different difficulty levels can be expanded through metric learning and other methods to achieve better difficulty level classification.
[0146] Specifically, in embodiments of this disclosure, machine learning can be used to optimize the generation and optimization of labels corresponding to difficulty levels. As an example, machine learning is used to learn from musical score features and materials from different grading standards to obtain feature probability predictions and value ranges for different grading standards' levels (e.g., levels 1-10). Further, sub-modules / labels are then subdivided for customization or adjustment to different training modes and modular training libraries. Such level label generation or optimization can be used at various stages of constructing the score training library in this disclosure, particularly involving the generation and adjustment of level labels, such as the aforementioned adjustment of the first and second level labels.
[0147] Figure 4 An example of creating a music score training database according to an embodiment of the present disclosure is illustrated, and in particular, the setting and optimization of labels in the creation of a music score training database according to an embodiment of the present disclosure are shown.
[0148] First, obtain the pre-set music feature set. In particular, the features of materials in the libraries of different modules can be preset. The pre-set music feature set can be obtained as described above.
[0149] Then, based on the music features corresponding to the corresponding levels in the pre-set music feature set, machine learning is used to extract music features from one or more music samples with corresponding levels contained in the music score training sample library, generating the first label. As an example, the music score training sample library can contain past examination materials (ABRSM examinations as an example), and machine learning methods are used to learn the difficulty level rules by referring to the pre-set feature set, generating the first-level difficulty label as the first label.
[0150] Then, based on the music features corresponding to the corresponding levels in the pre-set music feature set, specific combinations of music features and corresponding feature value ranges are selected to generate a second tag and at least one second sub-tag contained within the second tag. Specifically, the second tag and second sub-tag can be generated manually, for example, by music professionals, referring to the difficulty level.
[0151] Next, candidate music samples are generated based on the generated second label to serve as training material. Candidate music samples can be generated in two ways. One way is to obtain music fragments corresponding to each music feature specified by the second sub-label and combine these fragments to generate candidate music samples. For example, features corresponding to the second sub-label and their combinations can be selected, randomly arranged, and automatically generated to produce training sample material. The other way is to extract music samples that meet the music feature combinations and value ranges specified by the second sub-label from a music score training database as candidate music samples. For example, features and their combinations that meet the conditions can be extracted from existing music scores (either from a categorized music score library or crawled from the internet) to automatically generate training sample material. Optionally, the generated training sample material can also be formatted. For example, the material can be standardized according to the format requirements of the training data for each module (e.g., the rhythm library material needs to have its pitch markings removed) to serve as training sample material.
[0152] Then, the first label is validated based on the generated training sample material. Specifically, it is verified whether the candidate music sample conforms to the requirements of the first label, such as whether the features of the candidate music sample are included in the feature combination specified by the first label. Based on the validation results, appropriate subsequent processing can be performed.
[0153] If included, the validated training sample material can be used to build a music score training database, meaning that all validated training sample material can be saved to the database along with its grade labels.
[0154] If not included, the candidate music samples are further verified. In particular, the candidate music samples can be verified manually to verify the relationship between the candidate music samples and the first tag, the second tag, or even a pre-set feature set.
[0155] On the one hand, if the candidate music sample is verified to meet the requirements of the first label, the first label can be adjusted. In particular, the material can be manually used as a training sample for machine learning of the first label, thereby improving the accuracy of the first label.
[0156] On the other hand, if a candidate music sample does not conform to the first label but the features related to the difference of the sample are still included in the previously determined feature set, the second label is adjusted. In particular, the second label can be adjusted, for example, by manual adjustment or by machine learning comparison, so that it is consistent with the result of machine judgment.
[0157] On the other hand, if the features related to the difference of the sample are not included in the pre-set feature set, the pre-set feature set can be further adjusted. In particular, features can be added or refined to the pre-set feature set to improve the comprehensiveness and / or accuracy of the features in the feature set.
[0158] The specific implementation of the above-described scheme can be performed as described above, and will not be described in detail here. Furthermore, the operations described above can be performed iteratively on the samples used to construct the music score training library. Specifically, this can be performed iteratively for each level, where for each level of sample, the above-described difficulty label definition is executed one by one, and if it passes verification, it can be used as a candidate sample for constructing samples; if it fails, it can be used to adjust and optimize the difficulty labels.
[0159] According to some embodiments of this disclosure, the labels can be further refined to achieve more precise label settings and classifications, thereby enabling the construction of a more accurate music score training database. Figure 5 An exemplary implementation of the creation of a music score training database according to embodiments of the present disclosure, particularly the label setting and optimization during creation, is illustrated. In some embodiments of the present disclosure, in addition to generating a first label, a second label, and a second sub-label as described above, at least one first sub-label (L2) can be obtained based on a pre-set music feature set for the music score training database to be constructed, generated by machine learning. The first sub-label (L2) is included in the first label (L1). Furthermore, after verifying that candidate music samples generated according to the second sub-label (L2H) conform to the requirements of the first label (L1), it is further verified whether candidate music samples generated according to the second sub-label (L2H) conform to the requirements of the first sub-label (L2). The requirements of at least one of the first sub-label (L2) and the second sub-label (L2H) are adjusted based on the verification results, and even the pre-set feature set can be adjusted. The verification and adjustment operations of the first sub-label can be performed as described above for the verification and adjustment operations of the first label, and will not be described in detail here.
[0160] According to embodiments of this disclosure, a music score training database can be constructed based on verified candidate music samples and their corresponding grade labels. Specifically, the verified candidate music samples and their corresponding grade labels are stored together in the training sample database. Specifically, the candidate music samples are sorted in the database. The sorting can take various suitable forms. Various sorting settings can be followed. Specifically, the sorting settings can include, but are not limited to: difficulty sorting, knowledge point sorting, training module sorting, key indicator collaborative recommendation sorting (here referring to associated data recommendation under collaborative filtering algorithms), etc. As an example, recommendation sorting can involve analyzing user behavior sequences of different samples to mine user behavior habits and then generating suitable recommendation lists for users, completing training material generation, topic model modeling, difficulty sequence construction, difficulty sequence analysis, recommendation list generation of training content, providing content recommendations, and learning path planning, etc.
[0161] In some embodiments, the sorting settings may be generated based on user input, and candidate music samples that meet the same level label may be sorted based on the received sorting settings to construct a music score training database.
[0162] As an example, user input may include user input information, training data records, user preference records, etc. User input information may include, but is not limited to: age, gender, average training duration, average training frequency, cumulative training time, equipment used for training (piano / electric piano, etc.), learning goal selection (grade exam / interest cultivation / second instrument / professional training, etc.); which grade exam system is being used (ABRSM / CFM / Central Conservatory of Music / Shanghai Conservatory of Music, etc.); training data records may include, but are not limited to: average training duration, average training frequency, cumulative training time, frequency of assisted / independent training, average attention span during training, training error rate, average score (strengths and weaknesses) for each training evaluation dimension; user habit preference records may include, but are not limited to: whether the user is willing to repeat practice as recommended by the system when failing; how long the user gives up when encountering difficulties; under what training results or circumstances the user will choose additional reinforcement and supplementary practice; under what game incentive conditions the user will choose additional reinforcement and supplementary practice. Therefore, the data can be sorted according to the user characteristics or needs reflected in such user input. For example, a multi-dimensional model of musical scores obtained from training on the data can be matched with the characteristics of user samples and the scenarios in which they were used, such as common error points of different user samples in different scenarios, thereby prioritizing the matching data.
[0163] In other embodiments, the sorting settings can be based on priority, such as the priority of musical features. Specifically, the sorting settings include ranking according to the priority of individual musical features among the candidate music samples. The priority of musical features can take into account the application scenario of the database, the type of database, and so on. For example, for a certain type of application, certain features may be more important during the application; therefore, music samples containing these features can be prioritized and presented to the user first.
[0164] In one example, considering musical features such as measures, time signatures, and notes, the sorting rule for materials under the same difficulty label could be to prioritize sorting by the number of measures from fewest to most, then by the time signature for materials with the same number of measures, and finally by the number of notes from fewest to most if the first two rules are the same. Of course, other sorting methods are also possible.
[0165] As described above, by appropriately sorting the music score training database, it can be provided to the client in a suitable manner. This is equivalent to recommending training content to the user, and this part can be considered to be achieved by means of an equivalent recommendation system or recommendation module, which can be considered to be included in the device for generating or applying the audio training database disclosed herein. In particular, the recommendation module is used to analyze user behavior sequences of different samples to mine user behavior habits and then generate a suitable recommendation list for the user, completing training material generation, topic modeling, difficulty sequence construction, difficulty sequence analysis, recommendation list generation of training content, providing content recommendations and learning path planning, etc.; the recommendation includes finding similar user sample reference groups for collaborative recommendation based on the individual student's situation; the recommendation also includes recommending content related to modules, difficulty, and features based on the user's past training data records, targeting their weak points and unmastered knowledge points; the recommendation also includes strategy recommendations based on the user's training habit preferences, such as content difficulty matching, training rhythm, and duration.
[0166] According to embodiments of this disclosure, the application of a music score training database may include interaction between the music score training database and the user. On the one hand, music score training samples in the music score training database are provided to the user for use, such as for learning, training, etc. For example, if the music score training samples are sorted, they can be provided to the user according to the sorting method. On the other hand, the user can provide feedback on the application results, thereby optimizing the music score training database based on user feedback, including the sorting and presentation of the music score training database, and even the level labels and corresponding feature combinations of the music score training database, thereby obtaining a further improved music score training database.
[0167] According to embodiments of this disclosure, the constructed music score training database can be presented to the user through a presentation device, and the user can receive the level learning results based on the constructed music score training database; and the music sample configuration in the music score training database can be optimized based on the level learning results.
[0168] According to embodiments of this disclosure, before presenting the database to the user, the constructed music score training database can be further formatted to make it easier for the user to use. Specifically, the format can be adjusted according to the application scenario, user learning, training scenario, etc. In particular, format adjustment may include adjusting the format of music samples according to the desired score presentation format.
[0169] According to embodiments of this disclosure, the constructed music score training database can be presented to users in various suitable ways. As an example, it can be presented to users in a game-like manner, allowing them to learn and train in a similar way. Specifically, a converter can be used to recognize music scores from MusicXML files to visualize the digital music scores, and present them to users statically or dynamically, especially interactively. Accordingly, user training data tracking and feedback can be collected. For example, when users learn and train by playing games, the data or results obtained from playing the games can be provided as feedback to the music score training database. For instance, the pass times and accuracy of a large number of users on the same training track can be provided as feedback to the music score training database.
[0170] According to embodiments of this disclosure, the music score training database is adjusted and optimized based on user feedback data. In some embodiments, the difficulty ranking in the music score training database can be optimized based on the user training data feedback, such as optimizing the ranking based on user input as described above, thereby optimizing the samples presented to users in the music score training database application, making them more suitable for user training, etc.
[0171] In other embodiments, the level labels and associated features of the sheet music training database can be optimized based on user feedback data. Specifically, A / B testing can be considered, using student feedback when sheet music with different feature values is presented to measure the impact of corresponding features on difficulty, thereby reducing reliance on publicly available difficulty levels and developing a sheet music difficulty ranking more suitable for students' ability development curves. As an example, user feedback samples can also be used as training samples to train the sheet music training database in the manner described above, thereby making the levels in the sheet music training database more accurate. Thus, as an example, under different application modes (such as sight-reading, ear training, etc.), adaptive, refined difficulty prediction and ranking can be performed under different application modes by using accumulated user training data (such as error rate, final pass rate, average number of practice sessions required for first pass, etc.).
[0172] Figure 6 illustrates an exemplary flowchart of the construction and application of a music score training database according to an embodiment of the present disclosure. The construction of the music score training database can be implemented as described above, and will not be described in detail here. The constructed music score training database is then provided to the user for application.
[0173] First, candidate music samples that meet the same level label are sorted based on specific sorting settings to construct a music score training database. Specifically, materials that have passed the aforementioned verification are sorted and stored in association with difficulty level labels to obtain a database available to users. It should be noted that this step can also be included in the previous construction process of the music score training database.
[0174] As an example, sorting settings can be preset by the user or automatically set based on user input data, user usage data, etc., as described above. For instance, preset sorting rules for materials under the same difficulty level tag (the refined second-level difficulty tag) can be various appropriate sorting rules, especially considering multiple dimensions of tags for combined sorting. For example, sorting can be based on any tag order, such as considering priority. In one example, sorting could be done by the number of measures from fewest to most, by the time signature for materials with the same number of measures, or by the number of notes from fewest to most if the first two conditions are the same.
[0175] Then, the constructed music score training database is presented to the user through a presentation device. This can be any suitable presentation device, such as a game converter, which converts the training samples in the database into interactive games for the user to play. It should be noted that the samples in the training database can also be formatted before being presented to the user. For example, by unifying the format of the materials in the resource library according to the standard score presentation format of each module's database, a question bank for each module's materials is generated. In this way, the scores in the question bank are converted into interactive games through a game converter for the user to play (train).
[0176] Then, the system receives the user's training results based on the constructed music score training database; and optimizes the arrangement of music samples in the database based on the training results, particularly optimizing the sorting of music samples in the database. Specifically, under different application modes (such as sight-reading, ear training, etc.), through accumulated user training data (such as error rate, final pass rate, average number of practice sessions required for first pass, etc.), it performs adaptive and refined difficulty prediction and sorting under different application modes, making the music score training samples recommended or provided to the user more suitable for the user's application. As an example, it collects user training data tracking and feedback (such as the pass time and accuracy of a large number of users on the same training track); then, it continuously optimizes the difficulty sorting under the second-level difficulty label and returns to update the material library sorting; this part of the operation can also be called the user interaction sorting optimization loop.
[0177] In other embodiments, the construction and optimization of the music score training database can be further optimized by utilizing user feedback on the database. For example, more refined difficulty predictions and adjustments to difficulty levels can be achieved based on user feedback. Specifically, if user feedback on training results is unsatisfactory, such user-trained samples can be used as difficult samples. Then, decisions based on these difficult samples, particularly through machine learning and deep learning, can be made to adjust the features corresponding to the difficulty levels, thereby improving the difficulty labels in the data. This adjustment can be performed in various appropriate ways; for example, feature alignment can be used to compare features analyzed purely by machine to find highly correlated features and add them to a pre-prepared feature set (refining the pre-prepared feature set).
[0178] Ultimately, this results in a continuously updated and optimized modular training resource library. This library can continuously generate and optimize the training content for the corresponding modules during machine learning, automatic content generation, and user feedback processes.
[0179] The following describes specific application scenarios of the music score training database according to embodiments of this disclosure. In one embodiment, the music score training database of embodiments of this disclosure can be a specialized training library, especially a rhythm library for music rhythm training. The purpose or objective of the materials in the rhythm library is to help users (also referred to as trainees or trained users) improve their ability to apply note values and rhythmic patterns in different time signatures, become familiar with different rhythmic patterns and styles, and quickly read the rhythm in the score, laying a good foundation for overcoming the difficulties of music or instrument learning, while cultivating a good sense of rhythm.
[0180] The present disclosure describes the construction and application of a music score training database in a rhythm library application scenario.
[0181] S1 Difficulty Levels, Difficulty Ranking Definition
[0182] The six feature categories to be extracted are: a) time signature, b) rhythm pattern, c) rest, d) tie, e) number of measures, and f) voice part (ensemble, round, canon). In other words, in this application scenario, the library or collection can include these six features, and selection from these six features may be necessary when generating the first and second level labels.
[0183] Based on past examination materials (using ABRSM examinations as an example), S2 extracts features from the above six features according to the existing sight-reading data for each level; using machine learning methods, it learns the difficulty level classification rules of the features preset in S1 and the combinations of these features to generate the first level difficulty label.
[0184] Based on the learning progression pattern of rhythm, S3 defines first-level and second-level difficulty level labels for the features and feature combinations preset by S1 (the second-level difficulty labels are more refined and placed under the corresponding first-level difficulty level labels).
[0185] S4 selects the corresponding features and their combinations based on the second-level difficulty level labels generated by S3, randomly arranges and combines them, and automatically generates training materials.
[0186] Based on the second-level difficulty level labels generated by S3, S5 extracts features and combinations that meet the conditions from existing musical scores (which can be extracted from the D-class music score library or crawled from the external network) and automatically generates materials to be used.
[0187] S6 standardizes the format of the materials generated by S5 according to the format requirements of the rhythm training library (removing pitch markings); and generates training materials.
[0188] S7 compares the training materials generated by S4 and S6 with the first-level difficulty label defined in S2 to verify whether there is an inclusion relationship (i.e. whether it can be attributed to the corresponding first-level difficulty label).
[0189] If the result of S7 is not applicable, the training material should be placed in the manual verification library and manually judged whether the material can be classified into the first level difficulty label defined in S2.
[0190] If the result of the manual judgment described in S9-1 S8 is that it can be attributed, then the material needs to be manually fed into the machine learning in S2 to optimize the machine's definition of the first level of difficulty label.
[0191] If the result of the manual judgment described in S9-2 S8 is that it cannot be assigned, then the difficulty label defined by the manual in S3 needs to be manually adjusted and refined to match the result of the machine judgment.
[0192] (The above S2-S9 sections are the difficulty label definition loop of the rhythm training library)
[0193] S10 presets the sorting rules of the materials under the same difficulty level tag (the refined second level difficulty tag) (e.g., sorted by the number of measures from fewest to most, sorted by the time signature under the same number of measures, and sorted by the number of notes from fewest to most when the first two are the same).
[0194] S11 will be able to compare and sort the training materials from the S8 validation library according to the sorting rules preset in S10.
[0195] S12 inputs the sorted training materials from S11 into the material library.
[0196] (The following is a user interaction sorting optimization loop)
[0197] S13 processes the materials in the S12 material library according to the standard score presentation format of the rhythm-specific training library, and generates a question bank for rhythm-specific training.
[0198] S14 converts the sheet music in the question bank mentioned in S13 into an interactive rhythm training game using a game converter;
[0199] S15 users play games (for training);
[0200] S16 collects user training data tracking and feedback (such as the pass time and accuracy of a large number of users on the same training track);
[0201] Based on the user data collected in S16, S17 continuously optimizes the difficulty sorting under the second-level difficulty label and returns to S12 to update the material library sorting;
[0202] (The above S13-S17 sections are the user interaction sorting optimization loop)
[0203] According to another embodiment of this disclosure, the music score training database of this embodiment can be a specialized training database, especially a sight-singing database. The sight-singing database may include two parts: a first part is a pitch and intonation training database, and a second part is a single-part melody sight-singing training database. Furthermore, embodiments of this disclosure can be executed separately for each of the two parts of the sight-singing database.
[0204] In one embodiment, the purpose of the pitch and intonation training library is to improve a user's ability to read music in different clefs, familiarize them with standard pitch and the distance between notes, enable them to sing accurately, and gradually expand their vocal range. Its application scenarios are twofold: one is to use it directly as sheet music material for students to practice, such as... Figure 7A As shown. The second method is to convert it into training content for sight-singing software using an appropriate converter, such as... Figure 7B As shown, the sight-singing software can compare and demonstrate pitch based on the microphone's sound pickup, intuitively transforming invisible sounds into visible melodic lines. This allows learners to see the direction of sound travel, the distance between notes, and the difference between the standard pitch and their own singing, guiding them to correct mistakes and improve.
[0205] According to embodiments of this disclosure, the music score training database can be a pitch and intonation training database, and the construction and application of the music score training database in the application scenario of the pitch and intonation training database can be performed as follows.
[0206] The materials in the "Pitch and Accuracy Training Library" have six feature categories (single-note training, no rhythm): a) pitch feature, b) interval feature, c) range feature, d) number of notes feature, e) clef feature, and f) tonality feature (key signature, accidentals, and tonic chord). In other words, in this application scenario, the pre-set feature categories can include the above six feature categories, and selection from these six features may be required when generating the first and second level labels. Subsequent generation and optimization of level labels, database construction, user interaction, etc., can be performed as described above, for example, as in steps S2-S17, and will not be described in detail here.
[0207] In another embodiment, the purpose of the materials in the "Monophonic Melody Sight-Singing Training Library" is to enhance the user's sight-reading ability in different clefs, enabling them to sing accurately while controlling the rhythm, tempo, musicality, melodic beauty, phrasing, and breathing of a melody. Its application scenarios are twofold: one is to use it directly as sheet music for students to practice; the other is to convert it into training content within sight-singing software using an appropriate converter, such as... Figure 7C As shown, the sight-singing software can compare and demonstrate pitch and rhythm based on the microphone's sound pickup, similar to the singing scoring function of KTV or mobile "Changba" applications.
[0208] According to embodiments of this disclosure, the music score training database can be a monophonic melody sight-singing training database, and the construction and application of the monophonic melody sight-singing training database in application scenarios can be performed as follows.
[0209] The materials in the "Monophonic Melody Sight-Singing Training Library" have multiple feature categories, including but not limited to: a) interval features, b) range features, c) tonality features, d) musical terminology features, e) melodic line progressions (musical pattern features), etc. Among these, a, b, and c are mandatory, while d and e are options that are added progressively according to the level. That is, in this application scenario, the pre-set categories can include the above-mentioned feature categories, and selection from these categories may be required when generating first and second level labels. For example, the lower the level, the fewer feature categories can be included.
[0210] The subsequent generation and optimization of rating labels, database construction, user interaction, etc., can be performed as described above, such as steps S2-S17 mentioned above, and will not be described in detail here.
[0211] In one embodiment, the music score training database of this disclosure can be a specialized training library, particularly a sight-reading training library. The purpose or objective of the materials in the sight-reading training library is to enhance the user's ability to quickly read and play music, i.e., the ability to correctly play a new piece of music in the shortest possible time. Its application scenarios are twofold: one is to directly use the score material for students to practice; the other is to convert it into training content for sight-reading software through a converter in the E-Library. The sight-reading software can then compare, correct, and score the performance based on microphone recordings or data transmission from an electric piano connected to a computer. The application scenarios are illustrated in the figures.
[0212] In some embodiments of this disclosure, the construction and application of a sight-reading training library in an application scenario can be performed as follows.
[0213] The sight-reading library contains materials with multiple features, including but not limited to: a) interval features, b) range features, c) time signature features, d) rhythmic pattern features, e) rest features, f) tie features, etc. Features a, b, c, and d are mandatory, while e and f are options that can be added gradually as the level progresses.
[0214] In this application scenario, the pre-defined feature categories can include at least the aforementioned af features, and selection from these af features may be necessary when generating the first and second level labels. For example, the lower the level, the fewer features can be included. Specifically, features a, b, c, d, etc., can constitute the lowest difficulty level, corresponding to the lowest level difficulty label. As the difficulty label level increases, at least one of e and f can be gradually added. As an example, one of the aforementioned features e and f can be added for each level increase. The addition of features e and f can be random, or they can be ordered according to the application scenario, priority, etc., with higher priority features added first.
[0215] The subsequent generation and optimization of rating labels, database construction, user interaction, etc., can be performed as described above, such as steps S2-S17 mentioned above, and will not be described in detail here.
[0216] In one embodiment, the music score training database of this disclosure can be a specialized training database, particularly an auditory training database. The purpose or objective of the materials in the auditory training database is to enhance the user's inner hearing, auditory discrimination ability, etc., and as an example, the auditory training database may include at least one sub-database, such as at least one of the following: a sound training database (Guess Key / Guess Interval / Guess Chord), a tonality database (Guess Scales / Guess Chord / Guess the Tonality), a rhythm training database (clap the rhythm), a beat training database (clap the time), a melody discrimination database (telling difference), and a melody analysis database (Music Analysis). Embodiments of this disclosure can be applied to the aforementioned auditory training database, particularly the various sub-databases included in the auditory training database.
[0217] According to some embodiments of this disclosure, the construction and application of each sub-library in the listening training library can be performed as follows in the application scenario of the listening training library.
[0218] For the listening training library, at least several features can be extracted, including but not limited to: a. pitch, b. range, c. clef, d. chord, etc.
[0219] For the tonality recognition database, at least several features can be extracted, including but not limited to: a chord, b interval, c tonality features, etc.
[0220] For the rhythm training library, at least several features can be extracted, including but not limited to: a) rhythm pattern features, b) rest features, etc.
[0221] For the beat training library, at least several features can be extracted, including but not limited to a) rhythm pattern features and b) time signature features.
[0222] For the melody recognition database, at least several features can be extracted, including but not limited to a) rhythmic features and b) rest features.
[0223] For the melody analysis library, at least several features can be extracted, including but not limited to a) dynamics, b) playing technique, c) tempo, d) tonality, e) period.
[0224] In other words, in this application scenario, for each sub-library in the aforementioned listening training library, the pre-set feature categories can include the features that need to be extracted from each sub-library, and selection from such features may be necessary when generating the first and second level labels. Subsequent generation and optimization of level labels, database construction, user interaction, etc., can be performed as described above, for example, as in steps S2-S17, and will not be described in detail here.
[0225] In one embodiment, the music score training database of this disclosure may be a fingerboogie. The purpose of the fingerboogie is to improve the user's basic finger skills and playing techniques, enabling them to support the expression of musical works from a technical perspective.
[0226] S1 Difficulty Levels, Difficulty Ranking Definition
[0227] Manually define the material categories, levels, and difficulty rankings (this part does not require machine learning or difficulty label definition, as the content is all clearly stratified in difficulty and can be done through presets).
[0228] S2 automatically generates training materials and stores them in the database.
[0229] Based on the classification, difficulty, and sorting definitions in S1, the materials are identified, sorted, and formatted according to templates before being stored in the database.
[0230] S3 conversion into skills training software content
[0231] The finished material is converted into a training piece in the skills training software using the converter in E-Library.
[0232] S4 records user operation data in the software and re-optimizes the difficulty ranking of the materials based on a large amount of user operation data (time to complete a certain material, accuracy rate, etc.).
[0233] The audio training sample library construction and application scheme according to the embodiments of this disclosure can be applied to the classification of musical scores.
[0234] The categorized sheet music library is a music score classification system based on feature extraction of different categories. It can perform score recognition and automatic classification, and also supports intelligent searching by users based on tags. Specifically, the tag categories of this categorized sheet music library include, but are not limited to, the following feature extraction categories: A1 categorized by key, meter, rhythmic pattern, hand position, musical notation, harmony, playing technique, intervals, and melody / musical form; A2 categorized by period, author, genre, style, and mode / form; A3 categorized by thematic features of the score.
[0235] Furthermore, the label categories of the categorized sheet music library disclosed herein may also include classification according to the difficulty labels of the sheet music. In particular, the interactive refinement algorithm described in this application is primarily aimed at implementing this classification according to the difficulty labels of the sheet music.
[0236] According to embodiments of this disclosure, features can be extracted based on the basic elements of music, thereby extracting features that characterize the basic attributes of music, including but not limited to interval features, range features, time signature features, and rhythm features. The specific implementation steps are similar to S1-S17 of a rhythm library, except that the extracted features are different.
[0237] According to embodiments of this disclosure, multi-dimensional sheet music labeling is supported by constructing a multi-dimensional model, so that each digital sheet music can be categorized and labeled from multiple dimensions that are different from each other. The number and selection of dimensions can be appropriately set.
[0238] As an example, digital sheet music can be classified and labeled from three perspectives: music theory feature labels, difficulty labels, and training module labels. For example... Figure 8 As shown, in the 3D model, the X-axis represents music theory feature labels, which can be obtained by identifying music theory-related features of the score. These features may include, but are not limited to, range, time signature, tonality, intervals, rhythmic patterns, musical patterns, hand positions, fingering, playing techniques, and musical notation. Each music theory feature is matched with a corresponding difficulty level and training module. The Z-axis represents training module labels, which are variable labels that can define training modules for appropriate application scenarios. These may include, but are not limited to, skill modules, sight-reading training modules, aural skills training modules, rhythm training modules, music theory modules, composition modules, music appreciation modules, and sight-singing modules, allowing for customized and targeted training. The Y-axis represents difficulty labels, which can be used to classify the difficulty level of different training modules and different music theory feature labels. Thus, the X, Y, and Z axes work together to form countless points in the 3D model. These points constitute a complete digital score label, for example... Figure 8 The surface formed by countless points corresponds to a label of a digital musical score, which covers multiple feature dimensions, multiple application scenarios and multiple difficulty levels.
[0239] It should be noted that the above description is for simplification, and the multi-dimensional music score labels of this disclosure can be further subdivided. Specifically, each feature included in the music pattern category can be considered a single dimension; for example, the eight features shown in the attached figure can be considered eight dimensions. When input into the model constructed by this disclosure, the output is a digital music score label. Of course, a corresponding digital music score can also be output for each application scenario.
[0240] According to embodiments of this disclosure, multi-dimensional sheet music tags can be supported by constructing a multi-dimensional database, where the number and selection of dimensions can be appropriately set. As an example, the multi-dimensional database may include a sheet music information table, a music library feature table, a music library difficulty rating table, and a music library theme feature table. Specifically, the sheet music information table may include relevant features such as sheet music number, sheet music name, composer, period, genre, style, mode / form, ABRSM rating, CMPA rating, and Shanghai Conservatory of Music rating; the music library feature table may include relevant features such as section number, sheet music number, measure name, length, tonality, meter, rhythmic pattern, hand position, musical notation, harmony, playing technique, intervals, and melody line; the difficulty rating table may include relevant features such as sheet music number, training mode, level one difficulty tag, and level two difficulty tag; and the music library theme feature table may include relevant features such as section number, style, mode / form, genre, associative quality, objective descriptive quality, and character portrayal.
[0241] An example programming language for creating and using a music sheet database (msdb) can be as follows:
[0242] Create database msdb;
[0243] use msdb;
[0244] Create a sheet music information table
[0245]
[0246] Create a difficulty level feature table
[0247]
[0248] ...
[0250] The following description, with reference to the accompanying drawings, outlines an apparatus for constructing a music score training database according to embodiments of the present disclosure. The music score training database has at least one level, with a corresponding level label for each level. As shown, the construction apparatus 900 includes a processing circuit 902 configured to construct music score training samples according to embodiments of the present disclosure.
[0251] In the above-described example of the electronic device's structure, the processing circuit 902 can be in the form of a general-purpose processor or a dedicated processor, such as an ASIC. For example, the processing circuit 120 can be constructed from circuitry (hardware) or a central processing unit (such as a central processing unit (CPU)). Furthermore, the processing circuit 420 can carry a program (software) for making the circuitry (hardware) or the central processing unit work. This program can be stored in memory (such as arranged in memory) or in an externally connected storage medium, and can be downloaded via a network (such as the Internet).
[0252] According to embodiments of this disclosure, the processing circuit 902 may include various units for implementing the above-described functions. For example, the processing circuit may include: a first tag acquisition unit 904, configured to acquire a first tag (L1) generated by machine learning based on a pre-set music feature set for a music score training database to be constructed; and a second tag acquisition unit 906, configured to acquire a second tag (L1H) generated based on a pre-set music feature set for a music score training database to be constructed, and at least one second sub-tag (L2H) included in the second tag (L1H), wherein each second tag and sub-tag specifies a feature set in the music feature set. A combination of one or more musical features and their specific value ranges; a generation unit 908 configured to generate candidate musical samples based on the combination of musical features and the value range of musical features specified by the second sub-label (L2H); a verification unit 910 configured to verify at least one of the first label (L1) and the second label (L1H) using the generated candidate musical samples, wherein the specification of at least one of the first label (L1) and the second label (L1H) can be adjusted based on the verification results; and a creation unit 912 configured to construct a music score training database based on the verified candidate musical samples and their corresponding grade labels.
[0253] In some embodiments of this disclosure, the first tag acquisition unit 904 may be further configured to: extract music features from one or more music reference samples with corresponding levels in the music reference sample library through machine learning based on the music features corresponding to the corresponding levels in the pre-set music feature set, so as to determine the music feature combination corresponding to the level and its music feature value range and generate a first tag (L1).
[0254] In some embodiments of this disclosure, the second tag acquisition unit 906 may be further configured such that at least one of the second tag (L1H) and at least one second sub-tag (L2H) is generated by selecting a specific combination of music features and a corresponding range of feature values based on the music features corresponding to the corresponding level in a pre-set music feature set.
[0255] In some embodiments of this disclosure, the generation unit 908 may be further configured as follows:
[0256] Obtain the music fragments corresponding to each music feature that meets the second sub-label (L2H) definition, and combine the obtained music fragments to generate candidate music samples, or
[0257] Music samples that meet the music feature combinations and value ranges specified by the second sub-label (L2H) are extracted from the music score training database as candidate music samples.
[0258] In some embodiments of this disclosure, the verification unit 910 may be further configured to: verify whether the candidate music sample generated according to the second sub-label (L2H) conforms to the music feature combination and feature value range specified by the first label (L1).
[0259] In some embodiments of this disclosure, the verification unit 910 may be further configured as follows:
[0260] Extracting musical features from candidate music samples using machine learning;
[0261] The extracted music features are compared with the music feature combinations and feature value ranges specified in the first-level label to verify whether the candidate music samples meet the requirements of the first-level label.
[0262] In some embodiments of this disclosure, the verification unit 910 may further include an adjustment unit configured to adjust the specifications of at least one of the first tag (L1) and the second tag (L1H) based on the verification result. In some embodiments, the adjustment unit may be configured to further present the candidate music sample to the user for verification if the candidate music sample generated according to the second sub-tag (L2H) does not conform to the specifications of the first tag (L1); and adjust the specifications of at least one of the first tag (L1) and the second tag (L1H) according to the user's verification result.
[0263] In some embodiments, if the user verification result indicates that the candidate music sample conforms to the requirements of the first label (L1), the candidate music sample is used as a training sample to optimize the generation of the first label (L1) by machine learning; if the user verification result indicates that the candidate music sample does not conform to the requirements of the first label (L1), the second label (L1H) is adjusted.
[0264] In some embodiments, the adjustment unit is configured to adjust the combination of musical features and / or the value range of each musical feature specified by the second label with reference to the first label, so that the candidate music samples generated based on the adjusted second label conform to the specifications of the first-level label.
[0265] In some embodiments of this disclosure, the building unit 912 may be further configured as follows:
[0266] Receive sort settings input by the user, and
[0267] Candidate music samples that meet the same level label are sorted based on the received sorting settings to build a music score training database.
[0268] According to embodiments of this disclosure, the processing circuit 902 may further include a sub-label generation unit configured to acquire at least one first sub-label (L2) generated by machine learning based on a pre-set music feature set for a music score training database to be constructed, wherein the first sub-label (L2) is included in the first label (L1).
[0269] Furthermore, the verification unit may be further configured to: after verifying that the candidate music sample generated according to the second sub-tag (L2H) conforms to the provisions of the first tag (L1), further verify whether the candidate music sample generated according to the second sub-tag (L2H) conforms to the provisions of the first sub-tag (L2), wherein the provisions of at least one of the first sub-tag (L2) and the second sub-tag (L2H) are adjusted based on the verification result.
[0270] According to embodiments of this disclosure, the construction apparatus further includes a presentation unit configured to present the constructed music score training database to a user via a presentation device; a receiving unit configured to receive the user's level learning results based on the constructed music score training database; and an optimization unit configured to optimize the ranking of music samples in the music score training database based on the level learning results. It should be noted that the presentation unit and the receiving unit can be combined into a transceiver unit to enable interaction between the database and the user. The transceiver unit can be implemented based on various suitable data communication, data presentation, and data reception methods. For example, the presentation unit can be a device that presents music data visually or aurally, such as a display or speaker; the receiving unit can be any suitable type of signal receiving device, which will not be described in detail here.
[0271] The operation of each unit can be performed as described above, and will not be described in detail here. Units are drawn with dashed lines to illustrate that the unit is not necessarily included in the processing circuitry. As an example, the transmit / receive unit and the optimization unit can be located within the database construction apparatus but outside the processing circuitry, or even outside the construction apparatus itself. It should be noted that although...Figure 9 The individual units are shown as discrete units, but one or more of these units can be combined into one unit or split into multiple units.
[0272] It should be noted that the above-described units are merely logical modules divided according to their specific functions, and are not intended to limit the specific implementation method. For example, they can be implemented in software, hardware, or a combination of both. In actual implementation, the above-described units can be implemented as independent physical entities, or they can be implemented by a single entity (e.g., a processor (CPU or DSP, etc.), integrated circuit, etc.). Furthermore, the units shown in the accompanying drawings with dashed lines indicate that these units may not actually exist, and the operations / functions they perform can be implemented by the processing circuitry itself.
[0273] It should be understood that Figure 9 This is merely a schematic structural configuration of the music score training database construction apparatus according to embodiments of the present disclosure; the construction apparatus may also include other possible components (e.g., memory, etc.). Optionally, the construction apparatus may also include other components not shown, such as memory, radio frequency links, baseband processing units, network interfaces, controllers, etc. Processing circuitry may be associated with memory and / or an antenna. For example, processing circuitry may be directly or indirectly (e.g., with other components possibly connected in between) connected to memory for data access. Also, for example, processing circuitry may be directly or indirectly connected to an antenna for transmitting and receiving radio signals via a communication unit.
[0274] The memory can store various information generated by the processing circuit 902 (e.g., data service-related information, configuration resource information, etc.), programs and data used for the operation of the terminal-side electronic device, and data to be sent by the terminal-side electronic device. The memory can also be located within the terminal-side electronic device but outside the processing circuit, or even outside the terminal-side electronic device itself. The memory can be volatile and / or non-volatile. For example, the memory can include, but is not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), and flash memory.
[0275] It should be noted that the above description is merely exemplary. The embodiments of this disclosure can also be performed in any other suitable manner, still achieving the advantageous effects obtained by the embodiments of this disclosure. Moreover, the embodiments of this disclosure can also be applied to other similar application instances, still achieving the advantageous effects obtained by the embodiments of this disclosure. It should be understood that the machine-executable instructions in a machine-readable storage medium or program product according to embodiments of this disclosure can be configured to perform operations corresponding to the above-described device and method embodiments. When referring to the above-described device and method embodiments, the embodiments of the machine-readable storage medium or program product will be clear to those skilled in the art, and therefore will not be described again. Machine-readable storage media and program products used to carry or include the above-described machine-executable instructions also fall within the scope of this disclosure. Such storage media may include, but are not limited to, floppy disks, optical disks, magneto-optical disks, memory cards, memory sticks, etc.
[0276] Furthermore, it should be understood that the aforementioned series of processes and devices can also be implemented via software and / or firmware. In the case of software and / or firmware implementation, the corresponding program constituting the corresponding software is stored in the storage medium of the relevant device, and when said program is executed, it is capable of performing various functions. As an example, from the storage medium or network to a computer with a dedicated hardware architecture, such as… Figure 10 The general-purpose personal computer 1000 shown is equipped with the programs that constitute the software, and when various programs are installed, the computer is able to perform various functions, etc. Figure 10 This is a block diagram illustrating an example structure of a personal computer that can be used as an information processing device in embodiments of the present disclosure. In one example, the personal computer may correspond to the exemplary transmitting device or terminal-side electronic device described above according to the present disclosure.
[0277] exist Figure 10 In this system, the central processing unit (CPU) 1001 performs various processes based on the program stored in the read-only memory (ROM) 1002 or the program loaded into the random access memory (RAM) 1003 from the storage section 1008. The RAM 1003 also stores, as needed, the data required when the CPU 1001 performs various processes.
[0278] CPU 1001, ROM 1002 and RAM 1003 are connected to each other via bus 1004. Input / output interface 1005 is also connected to bus 1004.
[0279] The following components are connected to the input / output interface 1005: input section 1006, including a keyboard, mouse, etc.; output section 1007, including a display, such as a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; storage section 1008, including a hard disk, etc.; and communication section 1009, including a network interface card, such as a LAN card, modem, etc. The communication section 1009 performs communication processing via a network, such as the Internet.
[0280] As needed, drive 1010 is also connected to input / output interface 1005. Removable media 1011, such as disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on drive 1010 as needed, so that computer programs read from them can be installed into storage section 1008 as needed.
[0281] When the above series of processes are implemented through software, the program constituting the software is installed from a network such as the Internet or a storage medium such as removable media 1011.
[0282] Those skilled in the art will understand that such storage media are not limited to Figure 10 The illustration shows a removable medium 1011 containing a program, distributed separately from the device to provide the program to the user. Examples of removable media 1011 include magnetic disks (including floppy disks (registered trademark)), optical disks (including optical disc read-only memory (CD-ROM) and digital versatile disks (DVD)), magneto-optical disks (including mini-discs (MD) (registered trademark)), and semiconductor memory. Alternatively, the storage medium may be ROM 1002, a hard disk included in storage section 1008, etc., containing programs and distributed to the user along with the device containing them.
[0283] Furthermore, it should be understood that the multiple functions included in one unit in the above embodiments can be implemented by separate devices. Alternatively, the multiple functions implemented by multiple units in the above embodiments can be implemented by separate devices respectively. In addition, one of the above functions can be implemented by multiple units. Needless to say, such a configuration is included within the scope of the present disclosure.
[0284] In this specification, the steps described in the flowchart include not only processes executed sequentially in the stated order, but also processes executed in parallel or individually, rather than necessarily sequentially. Furthermore, even within the steps of sequential processing, needless to say, the order can be appropriately altered.
[0285] While this disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions, and modifications can be made without departing from the spirit and scope of this disclosure as defined by the appended claims. Furthermore, the terms "comprising," "including," or any other variations thereof used in embodiments of this disclosure are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0286] While some specific embodiments of this disclosure have been described in detail, those skilled in the art should understand that the above embodiments are illustrative only and do not limit the scope of this disclosure. Those skilled in the art should understand that the above embodiments can be combined, modified, or replaced without departing from the scope and spirit of this disclosure. The scope of this disclosure is defined by the appended claims.
Claims
1. A method for constructing a music score training database, wherein the music score training database has at least one level, and each level has a corresponding level label, the method comprising the following steps: For each level Obtain the first label (L1) generated through machine learning. Obtain a second label (L1H) generated based on a pre-set music feature set for the music score training database to be constructed, and at least one second sub-label (L2H) contained in the second label (L1H), wherein each second label and sub-label specifies a combination of one or more music features in the music feature set and their specific value range. Candidate music samples are generated based on the combination of music features specified by the second sub-label (L2H) and the range of music feature values. The generated candidate music samples are used to verify at least one of the first label (L1) and the second label (L1H), wherein the specification of at least one of the first label (L1) and the second label (L1H) can be adjusted based on the verification results; as well as A music score training database is constructed based on verified candidate music samples and their corresponding labels.
2. The method according to claim 1, wherein, The pre-defined music feature set for the music score training database to be constructed includes the music features associated with the music score training database to be constructed and the possible value range of each music feature.
3. The method according to claim 1, wherein, The first label (L1) is generated as follows: The first label (L1) is generated by extracting music features from one or more music reference samples with corresponding levels in the music reference sample library through machine learning, in order to determine the combination of music features corresponding to the level and the range of music feature values.
4. The method according to claim 1, wherein, At least one of the second tag (L1H) and at least one second sub-tag (L2H) is generated in the following manner: Based on the music features corresponding to the corresponding levels in the pre-set music feature set, select specific combinations of music features and corresponding feature value ranges.
5. The method according to claim 1, wherein, Candidate music samples generated based on the second sub-tag (L2H) include at least one of the following: Obtain the music fragments corresponding to each music feature that meets the second sub-label (L2H) definition, and combine the obtained music fragments to generate candidate music samples, or Music samples that meet the music feature combinations and value ranges specified by the second sub-label (L2H) are extracted from the music score training database as candidate music samples.
6. The method according to claim 1, wherein, The first label (L1) is validated using the generated candidate music samples, including verifying whether the generated candidate music samples meet the requirements of the music feature combination and feature value range of the first label (L1).
7. The method according to claim 6, wherein, Verifying whether the generated candidate music samples meet the requirements of the first label (L1) includes: Extracting musical features from candidate music samples using machine learning; The extracted music features are compared with the music feature combinations and feature value ranges specified in the first-level label to verify whether the candidate music samples meet the requirements of the first-level label.
8. The method according to claim 7, wherein, When the type and value of the extracted music features are included within the range of music feature combinations and feature values specified by the first label, the candidate music sample is considered to conform to the first label.
9. The method according to claim 1, wherein, Adjustments to the definition of at least one of the first label (L1) and the second label (L1H) based on the verification results include: If the candidate music sample generated based on the second sub-tag (L2H) does not conform to the requirements of the first tag (L1), the candidate music sample is further presented to the user for verification; and The requirements for at least one of the first label (L1) and the second label (L1H) shall be adjusted based on the user's verification results.
10. The method according to claim 9, wherein, If the user's verification result indicates that the candidate music sample meets the requirements of the first label (L1), then the candidate music sample is used as a training sample to optimize the generation of the first label (L1) through machine learning; and / or If the user's verification results indicate that the candidate music sample does not meet the requirements of the first label (L1), the second label (L1H) is adjusted.
11. The method according to claim 10, wherein, Adjusting the second label includes adjusting the combination of musical features and / or the value range of each musical feature specified by the second label with reference to the first label, so that the candidate music samples generated based on the adjusted second label conform to the specifications of the first-level label.
12. The method of claim 1, further comprising: Adjust the rating labels and / or combinations of features associated with the rating labels in the pre-set music feature set based on the user's verification results.
13. The method according to claim 1, wherein, This further includes obtaining at least one first sub-label (L2) generated by machine learning based on a pre-set music feature set trained on the music score training database to be constructed, wherein the first sub-label (L2) is included in the first label (L1). Furthermore, after verifying that the candidate music samples generated according to the second sub-label (L2H) conform to the provisions of the first label (L1), it is further verified whether the candidate music samples generated according to the second sub-label (L2H) conform to the provisions of the first sub-label (L2), wherein the provisions of at least one of the first sub-label (L2) and the second sub-label (L2H) are adjusted based on the verification results.
14. The method according to claim 1, wherein, The music score training database is constructed based on validated candidate music samples and their corresponding grade labels, including: Candidate music samples that meet the same level label are sorted based on specific sorting settings to build a music score training database.
15. The method according to claim 14, wherein, The sorting settings include sorting according to the priority of each musical feature in the candidate music samples, or an adaptive sorting based on user input.
16. The method of claim 1, further comprising: The constructed music score training database is presented to the user through a presentation device. Receive the training results from the user based on the constructed music score training database; and The configuration of music samples in the music score training database is optimized based on the training results. The music sample configuration includes at least one of the following: the ranking of music samples, or the combination of grade labels and / or related features in the music score training database.
17. An apparatus for constructing a music score training database, wherein the music score training database has at least one level, and for each level has a corresponding level label, the apparatus comprising a processing circuit configured to: for each level, Obtain the first label (L1) generated through machine learning. Obtain a second label (L1H) generated based on a pre-set music feature set for the music score training database to be constructed, and at least one second sub-label (L2H) contained in the second label (L1H), wherein each second label and sub-label specifies a combination of one or more music features in the music feature set and their specific value range. Candidate music samples are generated based on the combination of music features specified by the second sub-label (L2H) and the range of music feature values. The generated candidate music samples are used to validate at least one of the first tag (L1) and the second tag (L1H), wherein the definition of at least one of the first tag (L1) and the second tag (L1H) can be adjusted based on the validation results; and A music score training database is constructed based on verified candidate music samples and their corresponding labels.
18. An apparatus comprising: One or more processors; as well as One or more storage media storing instructions that, when executed by the one or more processors, cause the method according to any one of claims 1-16 to be performed.
19. A computer-readable storage medium storing instructions that, when executed by one or more processors, cause to perform the method according to any one of claims 1-16.
20. An apparatus comprising components for performing the method according to any one of claims 1-16.