A data format self-recognition checking method for a solid state disk

By using a self-identification and verification method for solid-state drives (SSDs) data formats, analyzing data structures, matching templates, and generating segmented verification codes, the problem of uneven data reliability in mixed load scenarios for SSDs is solved, achieving more stable data format identification and verification.

CN122240385APending Publication Date: 2026-06-19HUIJU ELECTRONICS (DONGGUAN) IND CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUIJU ELECTRONICS (DONGGUAN) IND CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing solid-state drives (SSDs) face challenges in data format identification and verification, including complex parameter dependencies, inability to dynamically adjust verification strategies, inaccurate location of abnormal data, and uneven data reliability. They are particularly difficult to operate stably under mixed load scenarios.

Method used

By performing structural feature analysis on the data to be written, matching format recognition templates, establishing a mapping relationship between fields and physical locations, generating segmented check codes, and reconstructing the logical structure during the reading phase to perform format consistency verification, combined with wear leveling, abnormal physical pages are accurately located and repaired.

🎯Benefits of technology

It improves the data reliability of solid-state drives, reduces the difficulty of debugging format self-identification and verification parameters, and ensures the clarity of field location, segment verification, and error recovery.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240385A_ABST
    Figure CN122240385A_ABST
Patent Text Reader

Abstract

This application discloses a data format self-identification and verification method for solid-state drives (SSDs) to improve data reliability. The method includes: analyzing the file structure of the data to be written to determine its structural characteristics; matching the structural characteristics with a target format identification template from a preset format identification template library to obtain a target format identification template; extracting preset field information from the target format identification template and establishing a mapping relationship between the preset field information and the corresponding storage location in the SSD; performing segmented checksum generation processing on each data segment of the data to be written; reading stored data from the physical pages of the SSD and reconstructing the logical data structure, performing format consistency verification on the reconstructed logical data structure, and obtaining the verification result; determining the corresponding abnormal physical page based on the mapping relationship, and performing abnormal data repair processing on the abnormal physical page.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of solid-state drive technology, and in particular to a data format self-identification and verification method for solid-state drives. Background Technology

[0002] In existing solid-state drive (SSD) data writing and verification processes, data formats often rely on fixed structures or preset rules for identification, and the format verification logic is usually deeply coupled with the data organization method. When dealing with data from different sources and with different structures, there are many configuration items for the identification path, field parsing methods, and verification process, requiring frequent parameter adjustments in stages such as format identification, field location, and logical structure recovery. Especially in scenarios with mixed workloads and multiple data types coexisting, the parameter dependencies are complex, and the debugging process is prone to problems such as redundant configurations, rule conflicts, or inconsistent verification behaviors, making it difficult for the format self-identification and verification process to run stably.

[0003] Meanwhile, during long-term use, factors such as differences in physical block wear, error frequency, and fluctuations in operating status can cause uneven data reliability in solid-state drives (SSDs). Existing technologies typically employ uniform verification methods or fixed-level protection strategies, failing to dynamically adjust verification strength based on storage environment variations. This can easily lead to unpredictable verification failures or data recovery difficulties in high-risk areas. Furthermore, the location and processing of abnormal data are generally coarse-grained and lack specificity, further increasing the difficulty of debugging verification strategies and impacting overall data reliability. Summary of the Invention

[0004] This application provides a data format self-identification and verification method for solid-state drives (SSDs), which can improve the data reliability of SSDs.

[0005] The first aspect of this application provides a data format self-identification and verification method for solid-state drives, including:

[0006] The file structure of the data to be written is analyzed to determine the structural characteristics of the data to be written. Based on the structural features, a target format recognition template is obtained by matching it in a preset format recognition template library; Extract preset field information from the target format recognition template and establish a mapping relationship between the preset field information and the corresponding storage location in the solid-state drive; Perform segment check code generation processing on each data segment in the data to be written to obtain a segment check code that corresponds one-to-one with each data segment, and write the segment check code and the corresponding data segment together into the solid-state drive; Based on the target format identification template and the mapping relationship, the stored data is read from the physical pages of the solid-state drive and the logical data structure is reconstructed. The format consistency check is performed on the reconstructed logical data structure to obtain the check result. When the verification result indicates an anomaly, the corresponding abnormal physical page is determined according to the mapping relationship, and abnormal data repair processing is performed on the abnormal physical page.

[0007] Optionally, the step of performing segment checksum generation processing on each data segment in the data to be written to obtain a segment checksum corresponding to each data segment includes: Determine the risk level for each data segment in the data to be written; Based on the risk level, select the corresponding segmented check code algorithm and redundancy length for each data segment, and perform segmented check code generation processing on each data segment according to the segmented check code algorithm and the redundancy length to obtain a segmented check code that corresponds one-to-one with each data segment.

[0008] Optionally, determining the risk level for each data segment in the data to be written includes: Determine the target physical block corresponding to each data segment before writing, and combine the number of erase / write operations of the target physical block, the error correction statistics within the most recent preset time window, and the operating parameters of the solid-state drive into a risk assessment feature vector; The risk assessment feature vector is input into the pre-trained risk assessment model to obtain the risk level corresponding to each data segment.

[0009] Optionally, selecting the corresponding segmentation check code algorithm and redundancy length for each data segment based on the risk level includes: When the risk level is lower than a preset first threshold, a first type of segmented check code algorithm is selected for the corresponding data segment, and a first redundancy length is adopted; When the risk level is between the first threshold and the preset second threshold, a second type of segmented check code algorithm is selected for the corresponding data segment, and a second redundancy length greater than the first redundancy length is adopted. When the risk level is higher than the second threshold, a third type of segmented check code algorithm is selected for the corresponding data segment, and a third redundancy length greater than the second redundancy length is adopted.

[0010] Optionally, writing the segmented checksum along with the corresponding data segment into the solid-state drive includes: For target data segments with a risk level higher than the preset third threshold and whose corresponding fields are marked as key fields in the target format recognition template, the target data segment and its corresponding segment check code are written into at least two physical blocks of the solid-state drive, and the multi-page mapping information corresponding to the target data segment is recorded in the mapping relationship. Optionally, the method further includes: Maintain a format anomaly count for each physical block of the solid-state drive. When a format consistency check anomaly is detected in a physical page located within a physical block, increment the format anomaly count for the corresponding physical block. Based on the format anomaly count and the number of times the physical block has been erased and rewritten, each physical block is divided into at least three reliability levels; The abnormal data repair process performed on the abnormal physical page includes: When the reliability level of the physical block containing the abnormal physical page is lower than the preset reliability level threshold, during the data rewrite operation, the rewritten data is migrated to a physical block with a higher reliability level, and the corresponding physical page information is updated in the mapping relationship.

[0011] Optionally, based on the format anomaly count and the number of erase / write operations of the physical block, each physical block is divided into at least three reliability levels, including: For each physical block, a first reliability score based on the number of erase / write cycles and a second reliability score based on the format anomaly count are calculated, and the first reliability score and the second reliability score are weighted and combined to obtain a comprehensive reliability score. Based on the comprehensive reliability score, each physical block is divided into one of the high reliability block, normal reliability block and low reliability block; When performing garbage collection and background data relocation on the solid-state drive, high reliability blocks are preferred for data containing structure control fields and critical business fields, while ordinary reliability blocks or low reliability blocks are preferred for data containing only ordinary data fields.

[0012] A second aspect of this application provides a data format self-identification and verification device for a solid-state drive, used to perform the method of the first aspect and any possible implementation thereof, the device comprising: The analysis unit is used to analyze the file structure of the data to be written in order to determine the structural characteristics of the data to be written. A matching unit is used to match a target format recognition template in a preset format recognition template library based on the structural features; The extraction unit is used to extract preset field information from the target format recognition template and establish a mapping relationship between the preset field information and the corresponding storage location in the solid-state drive; The generation unit is used to perform segment check code generation processing on each data segment in the data to be written, to obtain a segment check code that corresponds one-to-one with each data segment, and to write the segment check code and the corresponding data segment together into the solid-state drive. The verification unit is used to identify the template and the mapping relationship according to the target format, read the stored data from the physical pages of the solid-state drive and reconstruct the logical data structure, perform format consistency verification on the reconstructed logical data structure, and obtain the verification result. The determining unit is used to determine the corresponding abnormal physical page according to the mapping relationship when the verification result is characterized as abnormal, and to perform abnormal data repair processing on the abnormal physical page.

[0013] A third aspect of this application provides an electronic device, comprising: Processor, memory, input / output units, and bus; The processor is connected to the memory, the input / output unit, and the bus; The memory stores a program, and the processor calls the program to execute the method of the first aspect and any possible implementation of the first aspect.

[0014] The fourth aspect of this application provides a computer-readable storage medium storing a program that, when executed on a computer, causes the computer to perform the methods of the first aspect and any possible implementation thereof.

[0015] As can be seen from the above technical solutions, this application has the following advantages: In this application, structural feature analysis and format recognition template are completed before writing, a mapping relationship between fields and physical locations is established, and segment check codes that can be verified individually are generated for each data segment. At the same time, during the reading stage, the logical structure is reconstructed based on the template and format consistency verification is performed. When an anomaly is detected, the mapping relationship is used to accurately locate the physical page and perform data rewriting and wear leveling. Therefore, the overall process can keep the format recognition, data organization, verification and anomaly handling in a unified logic, making field location, segment verification and error recovery clearer. This reduces the difficulty of debugging data format self-recognition verification parameters and improves the data reliability of solid-state drives. Attached Figure Description

[0016] Figure 1 This is a flowchart illustrating one embodiment of the data format self-identification and verification method for solid-state drives in this application; Figure 2 This is a schematic diagram of the process for generating segmented check codes in this application; Figure 3 This is a flowchart illustrating the process of selecting the corresponding segmentation check code algorithm and redundancy length for each data segment in this application; Figure 4 This is a flowchart illustrating the process of performing abnormal data repair on abnormal physical pages in this application; Figure 5 This is a flowchart illustrating the process of dividing each physical block into at least three reliability levels in this application; Figure 6 This is a schematic diagram of one embodiment of the data format self-identification and verification device for solid-state drives in this application; Figure 7 This is a schematic diagram of the structure of one embodiment of the electronic device in this application. Detailed Implementation

[0017] This application provides a data format self-identification and verification method for solid-state drives (SSDs) to improve the data reliability of SSDs.

[0018] The embodiments of this application will now be described with reference to the accompanying drawings.

[0019] Please see Figure 1 , Figure 1 An embodiment of the data format self-identification and verification method for solid-state drives provided in this application includes: 101. Analyze the file structure of the data to be written to determine the structural characteristics of the data to be written; When performing file structure analysis on the data to be written, the data layout inside the file is first read, including the file header format, field order, field length, field type, and logical relationships between different areas. Then, based on this information, the organization method and boundary characteristics of the data are identified, thereby determining structural features, such as the location of key fields, the way logical segments are divided, and the dependencies between fields, forming a set of structural features required for subsequent processing.

[0020] 102. Based on structural features, the target format recognition template is obtained by matching it in the preset format recognition template library; After obtaining the structural features, a format recognition template library is first constructed. Different file types are classified and stored according to their data organization methods. Each template contains structural description information for the corresponding format, including feature bytes or signature identifiers, field offsets, field length rules, and logical segment division methods. For example, for XML format templates, the starting tag identifier and hierarchical nesting rules are recorded; for JSON format templates, the key-value pair structure and delimiter rules are recorded; for database page templates, the header identifier, field offsets, and record layout are recorded; and for video frame templates, the frame header signature, frame length, and key field positions are recorded. This allows the template library to cover multiple data structure types and provide comparable structural description information.

[0021] Specifically, during template matching, structural features are compared item by item with templates in the template library. First, feature extraction is performed on the structural features to generate corresponding structural fingerprint information. Then, hash or fingerprint comparison is performed based on the structural fingerprint and the feature bytes and field rules in the template to obtain preliminary matching results. Subsequently, structural similarity is calculated based on field layout, number of fields, field length, and logical segment division method. The similarity is compared with a preset threshold to filter out a set of candidate templates that meet the threshold conditions. These templates are then sorted according to template priority rules, with the template with the highest matching degree and higher priority being selected as the target format recognition template, thus completing the process of matching structural features to templates.

[0022] For example, matching can be performed according to the following process: Input: Structural features F, template library T; Output: Target format recognition template T_target.

[0023] 1. Perform feature extraction on F to obtain the structural fingerprint Fp; 2. Initialize the candidate template set C = {}; 3. Traverse each template Ti in the template library T: 3.1 Extracting Template Fingerprint Tip; 3.2 Calculate the fingerprint matching degree Si = hash_match(Fp, Tip); 3.3 If Si meets the initial screening conditions, then calculate the structural similarity Ri; 3.4 If Ri ≥ the preset threshold, then add Ti to the candidate set C; 4. Sort C according to template priority and similarity; 5. Select the template that is ranked first as T_target.

[0024] 103. Extract preset field information from the target format recognition template and establish a mapping relationship between the preset field information and the corresponding storage location in the solid-state drive; When extracting preset field information from the target format recognition template, the fields are classified according to the predefined field classification rules in the template. The field information is extracted according to the field offset position, field length rules and arrangement order. The fields are divided into structure control fields, key business fields and ordinary data fields. At the same time, the correspondence between fields and data segments is established in combination with the logical segment division method. According to the writing order of the data segments, each field is mapped to the corresponding storage location in the solid-state drive, forming a mapping relationship between field information and physical pages.

[0025] In practical applications, the target format recognition template assigns clear category identifiers to different fields. For example, length, offset, segment identifier, and version identifier fields are defined as structure control fields to describe data structure and parsing boundaries; primary key, index, timestamp, or status identifier fields are defined as key business fields to support core data access and status identification; and other data carrying business content are defined as ordinary data fields. During field extraction, the starting position of each field is located according to the field offset recorded in the template, field boundaries are parsed according to field length rules, and field parsing is completed in combination with the field arrangement order. Then, according to the logical segment division method in the template, the fields are assigned to the corresponding data segments. Subsequently, according to the writing order of the data segments in the physical page, each field is mapped to a specific physical page address, so that structure control fields, key business fields, and ordinary data fields have a clear positional correspondence in the physical storage space, providing a foundation for subsequent data reading, structure reconstruction, and format verification.

[0026] 104. Perform segment check code generation processing on each data segment in the data to be written to obtain a segment check code that corresponds one-to-one with each data segment, and write the segment check code and the corresponding data segment together into the solid-state drive; When processing the data to be written, the data to be written is divided into multiple data segments according to the segment division method defined in the mapping relationship. Then, a check code generation operation is performed on each data segment. The data content of each segment is input into the check processing algorithm to generate a segment check code. Subsequently, each segment check code is combined with the corresponding data segment and written to the solid-state drive in the order of physical pages identified by the mapping relationship, so that the data segment and its corresponding check code are stored adjacent to each other.

[0027] 105. Identify the template and mapping relationship based on the target format, read the stored data from the physical pages of the solid-state drive and reconstruct the logical data structure, perform a format consistency check on the reconstructed logical data structure, and obtain the check result; When writing data to be written is completed, or when reading the corresponding storage data, the data segments and their corresponding segment check codes are extracted from the physical pages of the solid-state drive according to the mapping relationship. Based on the data segment order and physical page address order recorded in the mapping relationship, the data segments in multiple physical pages are concatenated according to the logical segment division method. Then, according to the field order, field offset position and field length rules defined in the target format identification template, the concatenated data stream is parsed field by field to restore the starting position and boundary information of each field. Finally, the fields are reorganized according to the hierarchical structure and segment division relationship in the template to reconstruct the complete logical data structure.

[0028] After the logical data structure reconstruction is completed, the fields are divided into structure control fields, key business fields, and ordinary data fields according to the field classification rules defined in the template. A format consistency check is then performed, which includes field integrity checks and structure consistency checks. Field integrity checks include verifying whether the field length matches the length rules defined in the template, whether the magic byte or signature of key fields matches, and whether fields marked as required in the template exist. Structure consistency checks include verifying whether the field offsets conform to the offset relationships defined in the template, whether the parent-child hierarchy relationships between fields satisfy the structural constraints in the template, and whether the segment checksum corresponding to each data segment matches the data segment content. During the verification process, if any structure control field or key business field mismatches, or if the segment checksum verification fails, the verification result is judged as abnormal; only when all field integrity and structure consistency meet the template requirements is the result judged as normal.

[0029] Specifically, the following is one possible verification process in this application: Input: Reconstructed data structure D, template T; Output: Verification result R.

[0030] 1. Traverse the field definitions Fi in template T: 1.1 Check if field Fi exists in D; 1.2 Verify whether the field length conforms to the length rules; 1.3 If Fi is a structure control field, validate its offset and its relationship with the structure; 1.4 If Fi is a key business field, validate its value or format; 1.5 If any check fails, return R = exception.

[0031] 2. Traverse each data segment Di: 2.1 Verify whether the segment check codes match; 2.2 If there is no match, return R = exception.

[0032] 3. If all validations pass, return R = Normal.

[0033] In practical applications, if the data to be written is in a structured file format, after extracting and reconstructing the logical structure from the physical page, if it is detected that the actual length of a key field is inconsistent with the length defined in the template, or the offset position of a field deviates from the range specified in the template, or the segment check code of a data segment does not match its content, it is directly judged as an anomaly; while when the field length, offset relationship, structural hierarchy and check code all meet the template requirements, it is judged that the format consistency check has passed.

[0034] Introducing field integrity and structural consistency checks during format consistency verification simultaneously includes field length, field offset, structural hierarchy, and segment check codes, extending the verification granularity from the bit level to the structural level. This enables the identification of issues such as misaligned field boundaries and abnormal structural relationships. Furthermore, by combining mapping relationships to locate the corresponding physical pages, anomaly detection becomes more proactive and accurate. Consequently, data rewriting operations can be triggered earlier when anomalies occur, improving the data reliability of solid-state drives.

[0035] 106. When the verification result indicates an anomaly, determine the corresponding abnormal physical page according to the mapping relationship, and perform abnormal data repair processing on the abnormal physical page.

[0036] When the format consistency check result shows an anomaly, the mapping relationship is used to quickly locate the specific physical page that caused the anomaly. Then, a data rewrite operation is performed on the physical page, rewriting the original data segment and segment check code in the correct order. During the writing process, wear leveling is performed on the physical blocks in combination with the wear status of the solid-state drive, so that the write position is redistributed to a healthier area to maintain the stability and lifespan of the physical page.

[0037] In this embodiment, structural feature analysis and format recognition template are performed before writing, establishing a mapping relationship between fields and physical locations, and generating segment check codes that can be verified individually for each data segment. Simultaneously, during the reading phase, the logical structure is reconstructed based on the template and format consistency verification is performed. When an anomaly is detected, the mapping relationship is used to accurately locate the physical page and perform data rewriting and wear leveling. Therefore, the overall process can maintain a unified logic for format recognition, data organization, verification, and anomaly handling, making field location, segment verification, and error recovery clearer. This reduces the difficulty of debugging data format self-recognition verification parameters while improving the data reliability of the solid-state drive.

[0038] Please see Figure 2In some embodiments of this application, step 104 in the above embodiments, which performs segment check code generation processing on each data segment in the data to be written to obtain a segment check code corresponding to each data segment, may include the following steps: 201. Determine the target physical block corresponding to each data segment before writing, and combine the number of erase / write operations of the target physical block, the error correction statistics within the most recent preset time window, and the operating parameters of the solid-state drive into a risk assessment feature vector. When processing data segments, the target physical block to be written for each data segment is first determined based on the mapping relationship between fields and physical locations. Then, the current cumulative number of erase / write cycles for that physical block is read, and the number of erase / write cycles is converted into numerical features that can be used for evaluation in a normalized manner, such as proportional processing based on the maximum erase / write lifespan. At the same time, error correction statistics are extracted within the most recent preset time window. The time window can be set to the most recent number of seconds, minutes, or a fixed number of read / write cycles. Error correction-related indicators are statistically analyzed within this time window, including the number of error correction triggers, the proportion of error correction data, and the average error correction intensity. Furthermore, the current operating parameters of the solid-state drive are obtained, including operating status indicators such as temperature, voltage fluctuations, read / write pressure, and queue depth.

[0039] After obtaining the above parameters, the erase / write count characteristics, error correction statistics, and operational parameter characteristics are combined in a fixed order to form a multi-dimensional risk assessment feature vector. This feature vector can simultaneously reflect the wear status, historical error history, and current operating environment of the target physical block. For example, the multi-dimensional risk assessment feature vector can be composed of: erase / write count, error correction trigger count, error correction rate, temperature, voltage fluctuation amplitude, read / write pressure, and queue depth. It should be noted that the composition of the multi-dimensional risk assessment feature vector is merely an example and is not limited to this one composition. In practical applications, it can be adjusted according to actual needs.

[0040] 202. Input the risk assessment feature vector into the pre-trained risk assessment model to obtain the risk level corresponding to each data segment; After obtaining the risk assessment feature vector for each data segment, the feature vector is input into a pre-trained risk assessment model. The risk assessment model can be constructed using a decision tree model, a random forest model, or a neural network model, with the number of input layer nodes matching the dimension of the feature vector. For example, when the feature vector contains 7 dimensions, the number of input layer nodes is 7. The model output can be a discrete risk level identifier, such as risk level 1, risk level 2, and risk level 3, corresponding to low risk, medium risk, and high risk, respectively, or it can be a continuous risk score mapped to the corresponding level.

[0041] During the model training phase, a training dataset is constructed based on historical operating data. The data sources include the number of erase / write cycles, error correction statistics, and operating parameters recorded by the solid-state drive during actual operation. The samples are labeled in conjunction with actual data error situations or data recovery results. The model is trained offline and a stable parameter configuration is obtained. The trained model is then deployed for online risk assessment.

[0042] In actual evaluation, for example, if a physical block has 5,000 write cycles, an error correction rate of 0.3% within a preset time window, an operating temperature of 45°C, and a high level of read / write pressure, then inputting the corresponding feature vector into the model will result in a higher risk score and classify it as a high-risk level. On the other hand, physical blocks with fewer write cycles, lower error correction rates, and stable operating parameters correspond to a lower risk level, thus providing different data segments with risk level information that matches their writing environment.

[0043] 203. Select the corresponding segmented check code algorithm and redundancy length for each data segment according to the risk level, and perform segmented check code generation processing on each data segment according to the segmented check code algorithm and redundancy length to obtain a segmented check code that corresponds one-to-one with each data segment.

[0044] After obtaining the risk level, based on the preset verification strategy for different levels, a corresponding segmented checksum algorithm type and redundancy length are selected for each data segment. For example, an algorithm with stronger error correction capabilities and longer redundancy is selected for high-risk levels, while a lightweight algorithm with shorter redundancy is selected for low-risk levels. Subsequently, segmented checksum generation processing is performed on each data segment. The data segment content is input into the selected verification method, and segmented checksums are generated according to the corresponding redundancy length, so that each data segment has a verification redundancy capability adapted to its risk level.

[0045] In this embodiment, before generating segmented check codes, target physical block characteristics, error correction statistics, and operating parameters are introduced to form a risk assessment feature vector. The risk assessment model is used to determine the risk level of each data segment. Then, based on the risk level, an appropriate segmented check code algorithm and redundancy length are selected. This enables different data segments to have redundancy strength matching capability consistent with the physical environment during the writing stage. In this way, while meeting the actual protection needs of different data segments, the debugging difficulty of segmented check related parameters is reduced, and the data reliability of solid-state drives under complex operating conditions is improved.

[0046] Please see Figure 3 In some embodiments of this application, step 203 in the above embodiments, which selects the corresponding segmentation check code algorithm and redundancy length for each data segment according to the risk level, may include the following steps: 301. When the risk level is lower than the preset first threshold, select the first type of segmented check code algorithm for the corresponding data segment and adopt the first redundancy length; When the risk level assessment result is below the first threshold, it indicates that the number of erase / write cycles of the target physical block is low, the error correction statistics are in a stable range, and the operating parameters are not significantly abnormal, thus requiring less redundancy. In this case, the first type of segmented checksum algorithm is selected for the data segment. This algorithm is characterized by low computational overhead and basic error correction capability. At the same time, a first redundancy length matching this algorithm is used, so that the generated segmented checksum occupies only a small amount of additional storage space, meeting the verification requirements of low-risk write scenarios.

[0047] 302. When the risk level is between the first threshold and the preset second threshold, select the second type of segmented check code algorithm for the corresponding data segment and use a second redundancy length greater than the first redundancy length. When the risk level falls between the first and second thresholds, it indicates an increase in the number of erase / write cycles of the target physical block, fluctuations in error correction statistics, or slight instability in operating parameters. The data segment requires stronger redundancy to improve post-write reliability. Therefore, a second type of segmented checksum algorithm is selected for the data segment, achieving a better balance between error correction capability and computational complexity. In this case, a second redundancy length greater than the first redundancy length is used to expand the checksum redundancy range and address potential data errors that may occur under medium-risk conditions.

[0048] 303. When the risk level is higher than the second threshold, select the third type of segmented check code algorithm for the corresponding data segment and use a third redundancy length that is greater than the second redundancy length.

[0049] When the risk level exceeds the second threshold, it indicates that the target physical block is in a high-risk state, such as high write / erase cycles, dense error correction records, or significant fluctuations in operating parameters, which increases the probability of errors occurring after data is written. In this case, a third type of segmented checksum algorithm is selected for the data segment. This algorithm has stronger error correction capabilities and a wider checksum coverage. At the same time, a third redundancy length greater than the second redundancy length is used to give the segmented checksum the highest level of redundancy capability to resist multiple types of errors that may occur in high-risk physical environments.

[0050] In this embodiment, segmented check code algorithm types and matching redundancy lengths are selected for different risk levels. Low-risk data segments are configured with lightweight redundancy, medium-risk data segments with medium redundancy, and high-risk data segments with high-strength redundancy. This ensures that the check capability corresponds to the risk status of the physical block, the check redundancy allocation is more in line with the actual writing environment, the segmented check strategy is easier to determine, and the parameter adjustment range is clearer. This reduces the difficulty of debugging parameters related to segmented check and improves the data reliability of the solid-state drive.

[0051] Please see Figure 4 In some embodiments of this application, the method may further include the following steps: 401. Maintain a format anomaly count for each physical block of the solid-state drive. When a format consistency check anomaly is triggered by a physical page located within a physical block, increment the format anomaly count for the corresponding physical block. An independent format anomaly count is established for each physical block of the solid-state drive (SSD) to record the number of format consistency check anomalies that occur during the block's use. When an anomaly is detected in a physical page during format consistency check, the physical block to which the page belongs is immediately located, and the format anomaly count for that physical block is incremented, so that the physical block has a quantifiable state corresponding to the number of anomalies in subsequent evaluations.

[0052] 402. Based on the format anomaly count and the number of times the physical block has been erased and rewritten, each physical block is divided into at least three reliability levels; Based on the accumulated format anomaly count, and combined with the current number of erase / write operations on the physical block, the usage status of the physical block is comprehensively evaluated. When the format anomaly count is low and the number of erase / write operations is within a healthy range, the physical block is classified as a high reliability level; when the anomaly count increases or the number of erase / write operations gradually increases, the physical block is classified as a medium reliability level; when the anomaly count occurs frequently and the number of erase / write operations approaches the threshold, the physical block is classified as a low reliability level, forming at least three distinguishable reliability level systems.

[0053] In this embodiment, step 106, which performs abnormal data repair processing on the abnormal physical page, may include: 403. When the reliability level of the physical block containing the abnormal physical page is lower than the preset reliability level threshold, when performing the data rewrite operation, the rewritten data is migrated to the physical block with a higher reliability level, and the corresponding physical page information is updated in the mapping relationship.

[0054] When performing a data rewrite operation on an abnormal physical page, the system first determines whether the reliability level of the physical block containing the physical page is lower than a preset reliability level threshold. If the reliability is insufficient, the physical block is no longer used. Instead, the data is migrated to a physical block with a higher reliability level during the data rewrite, ensuring that the write location is in a more stable storage area. The new physical page information is then written into the mapping relationship, so that subsequent reads and verifications are performed according to the updated physical location.

[0055] In this embodiment, the physical block maintains anomaly counts, reliability levels are divided based on the number of erase / write cycles, and data migration is performed according to the reliability level during the data rewriting stage. This makes the occurrence of physical block anomalies and wear status quantifiable, the rewriting location of physical pages closer to the stable area, the mapping relationship update clearer, the data writing path easier to determine, and the rewriting strategy more targeted. Ultimately, this reduces the difficulty of parameter debugging during data anomaly handling and improves the data reliability of the solid-state drive.

[0056] Please see Figure 5 In some embodiments of this application, step 402 in the above embodiments, which divides each physical block into at least three reliability levels based on the format anomaly count and the number of times the physical block has been erased and rewritten, may include the following steps: 501. Calculate a first reliability score based on the number of erase / write cycles and a second reliability score based on the format anomaly count for each physical block, and then weight and combine the first reliability score and the second reliability score to obtain a comprehensive reliability score. When assessing the reliability of each physical block within a solid-state drive (SSD), a first reliability score is calculated based on the number of write cycles. The number of write cycles characterizes the wear and tear on the physical block; a higher number of write cycles corresponds to lower reliability. A second reliability score is then calculated based on the format anomaly count. The format anomaly count characterizes the frequency of structural anomalies occurring during the physical block's operation; a higher number of anomalies corresponds to lower reliability. After obtaining the first and second reliability scores, the two scores are weighted and combined to create a comprehensive reliability score that reflects both the long-term wear and tear and short-term anomaly conditions of the physical blocks.

[0057] For example, the formulas for calculating the first reliability score, the second reliability score, and the overall reliability score are as follows: Formula 1 Formula 2 Formula 3 in, This indicates the first reliability score. This indicates the second reliability score. This represents the overall reliability score. This indicates the current number of erase / write cycles for the physical block. Indicates the maximum erase / write life of the physical block. This represents the number of format anomalies within a preset statistical period. This indicates the upper limit of the format exception count. and These represent the weighting coefficients for the first reliability score and the second reliability score, respectively. and The weights can be adjusted according to the actual situation.

[0058] 502. Based on the comprehensive reliability score, each physical block is divided into one of the following: high reliability block, normal reliability block, and low reliability block. After obtaining the overall reliability score of the physical block, reliability classification is performed according to the preset score range threshold. When the overall reliability score is in the high range, the physical block is classified as a high reliability block; when the score falls in the middle range, the physical block is classified as a normal reliability block; when the score is in the low range, the physical block is classified as a low reliability block, so that the physical block has an identifiable reliability classification label in the subsequent allocation and scheduling process.

[0059] For example, the reliability level classification rules can be set as follows (R is the overall reliability score, and the range value can be adjusted according to the actual situation): R>80 indicates a high-reliability block; A block with a value of 40 ≤ R ≤ 80 is a standard reliability block. A block with R < 40 is considered a low-reliability block.

[0060] 503. When performing garbage collection and background data relocation on solid-state drives, for data containing structural control fields and critical business fields, high reliability blocks should be selected first; for data containing only ordinary data fields, ordinary reliability blocks or low reliability blocks should be selected first.

[0061] When performing garbage collection and backend data relocation, physical blocks of corresponding reliability levels are selected based on different types of data fields. When data contains structural control fields or critical business fields, high-reliability blocks are prioritized as the target location, ensuring that important data is written to physical areas with higher stability. When data contains only ordinary data fields, ordinary or low-reliability blocks are prioritized as the write location, ensuring that physical blocks of different reliability levels are used rationally, while also ensuring that core data and ordinary data are allocated hierarchically according to different risk levels.

[0062] In this embodiment, a comprehensive reliability score, including write / erase count and format anomaly score, is calculated for physical blocks. Physical blocks are divided into multiple reliability levels, and physical blocks of the corresponding level are selected based on the importance of data fields during garbage collection and background data relocation. This allows the scoring system to characterize the actual state of physical blocks, making the reliability classification clearer. The distribution of data among physical blocks of different levels is more consistent with field attributes, making it easier to determine write and migration locations. The overall scheduling strategy is more targeted, ultimately reducing the difficulty of parameter tuning and improving the data reliability of solid-state drives.

[0063] Please see Figure 6 , Figure 6 This application provides an embodiment of a data format self-identification and verification device for solid-state drives, which is used to perform the aforementioned... Figures 1 to 5 The steps in the illustrated embodiment include: Analysis unit 601 is used to analyze the file structure of the data to be written in order to determine the structural characteristics of the data to be written. Matching unit 602 is used to match the target format recognition template in a preset format recognition template library based on structural features; Extraction unit 603 is used to extract preset field information from the target format recognition template and establish a mapping relationship between the preset field information and the corresponding storage location in the solid-state drive; The generation unit 604 is used to perform segment check code generation processing on each data segment in the data to be written, to obtain a segment check code that corresponds one-to-one with each data segment, and to write the segment check code and the corresponding data segment together into the solid-state drive. The verification unit 605 is used to identify the template and mapping relationship according to the target format, read the stored data from the physical pages of the solid-state drive and reconstruct the logical data structure, perform format consistency verification on the reconstructed logical data structure, and obtain the verification result. The determination unit 606 is used to determine the corresponding abnormal physical page according to the mapping relationship when the verification result is characterized as abnormal, and to perform abnormal data repair processing on the abnormal physical page.

[0064] In this embodiment, the functions of each unit are as described above. Figures 1 to 5 The steps in the illustrated embodiments are the same and will not be repeated here.

[0065] Please see Figure 7 , Figure 7 One embodiment of the electronic device provided in this application includes: Processor 701, memory 702, input / output unit 703, and bus 704; The processor 701 is connected to the memory 702, the input / output unit 703, and the bus 704; The memory 702 stores a program, which the processor 701 calls to execute. Figures 1 to 5 The steps in the illustrated embodiment.

[0066] In this embodiment, the function of processor 701 is the same as described above. Figures 1 to 5 The steps in the illustrated embodiments are the same and will not be repeated here.

[0067] This application also provides a computer-readable storage medium on which a program is stored. When the program is executed on a computer, it causes the computer to perform the aforementioned actions. Figures 1 to 5 The method in any possible implementation.

[0068] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0069] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0070] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0071] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0072] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

Claims

1. A data format self-identification and verification method for solid-state drives, characterized in that, include: The file structure of the data to be written is analyzed to determine the structural characteristics of the data to be written. Based on the structural features, a target format recognition template is obtained by matching it in a preset format recognition template library; Extract preset field information from the target format recognition template and establish a mapping relationship between the preset field information and the corresponding storage location in the solid-state drive; Perform segment check code generation processing on each data segment in the data to be written to obtain a segment check code that corresponds one-to-one with each data segment, and write the segment check code and the corresponding data segment together into the solid-state drive; Based on the target format identification template and the mapping relationship, the stored data is read from the physical pages of the solid-state drive and the logical data structure is reconstructed. The format consistency check is performed on the reconstructed logical data structure to obtain the check result. When the verification result indicates an anomaly, the corresponding abnormal physical page is determined according to the mapping relationship, and abnormal data repair processing is performed on the abnormal physical page.

2. The method according to claim 1, characterized in that, The step of performing segmented checksum generation processing on each data segment in the data to be written to obtain a segmented checksum corresponding to each data segment includes: Determine the risk level for each data segment in the data to be written; Based on the risk level, select the corresponding segmented check code algorithm and redundancy length for each data segment, and perform segmented check code generation processing on each data segment according to the segmented check code algorithm and the redundancy length to obtain a segmented check code that corresponds one-to-one with each data segment.

3. The method according to claim 2, characterized in that, Determining the risk level of each data segment in the data to be written includes: Determine the target physical block corresponding to each data segment before writing, and combine the number of erase / write operations of the target physical block, the error correction statistics within the most recent preset time window, and the operating parameters of the solid-state drive into a risk assessment feature vector; The risk assessment feature vector is input into the pre-trained risk assessment model to obtain the risk level corresponding to each data segment.

4. The method according to claim 2 or 3, characterized in that, The step of selecting the corresponding segmented checksum algorithm and redundancy length for each data segment based on the risk level includes: When the risk level is lower than a preset first threshold, a first type of segmented check code algorithm is selected for the corresponding data segment, and a first redundancy length is adopted; When the risk level is between the first threshold and the preset second threshold, a second type of segmented check code algorithm is selected for the corresponding data segment, and a second redundancy length greater than the first redundancy length is adopted. When the risk level is higher than the second threshold, a third type of segmented check code algorithm is selected for the corresponding data segment, and a third redundancy length greater than the second redundancy length is adopted.

5. The method according to claim 1, characterized in that, The step of writing the segmented checksum along with the corresponding data segment into the solid-state drive includes: For target data segments whose risk level is higher than the preset third threshold and whose corresponding fields are marked as key fields in the target format recognition template, the target data segment and its corresponding segment check code are written into at least two physical blocks of the solid-state drive, and the multi-page mapping information corresponding to the target data segment is recorded in the mapping relationship.

6. The method according to claim 1, characterized in that, The method further includes: Maintain a format anomaly count for each physical block of the solid-state drive. When a format consistency check anomaly is detected in a physical page located within a physical block, increment the format anomaly count for the corresponding physical block. Based on the format anomaly count and the number of times the physical block has been erased and rewritten, each physical block is divided into at least three reliability levels; The abnormal data repair process performed on the abnormal physical page includes: When the reliability level of the physical block containing the abnormal physical page is lower than the preset reliability level threshold, during the data rewrite operation, the rewritten data is migrated to a physical block with a higher reliability level, and the corresponding physical page information is updated in the mapping relationship.

7. The method according to claim 6, characterized in that, The method involves classifying each physical block into at least three reliability levels based on the format anomaly count and the number of erase / write cycles of the physical block, including: For each physical block, a first reliability score based on the number of erase / write cycles and a second reliability score based on the format anomaly count are calculated, and the first reliability score and the second reliability score are weighted and combined to obtain a comprehensive reliability score. Based on the comprehensive reliability score, each physical block is divided into one of the high reliability block, normal reliability block and low reliability block; When performing garbage collection and background data relocation on the solid-state drive, high reliability blocks are preferred for data containing structure control fields and critical business fields, while ordinary reliability blocks or low reliability blocks are preferred for data containing only ordinary data fields.

8. A data format self-identification and verification device for solid-state drives, characterized in that, The apparatus for performing the method as described in any one of claims 1 to 7 comprises: The analysis unit is used to analyze the file structure of the data to be written in order to determine the structural characteristics of the data to be written. The matching unit is used to match the target format recognition template in a preset format recognition template library according to the structural features; The extraction unit is used to extract preset field information from the target format recognition template and establish a mapping relationship between the preset field information and the corresponding storage location in the solid-state drive; The generation unit is used to perform segment check code generation processing on each data segment in the data to be written, to obtain a segment check code that corresponds one-to-one with each data segment, and to write the segment check code and the corresponding data segment together into the solid-state drive. The verification unit is used to identify the template and the mapping relationship according to the target format, read the stored data from the physical pages of the solid-state drive and reconstruct the logical data structure, perform format consistency verification on the reconstructed logical data structure, and obtain the verification result. The determining unit is used to determine the corresponding abnormal physical page according to the mapping relationship when the verification result is characterized as abnormal, and to perform abnormal data repair processing on the abnormal physical page.

9. An electronic device, characterized in that, include: Processor, memory, input / output units, and bus; The processor is connected to the memory, the input / output unit, and the bus; The memory stores a program, and the processor calls the program to execute the method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains a program that, when executed on a computer, causes the computer to perform the method as described in any one of claims 1 to 7.