Memory bank protection
By storing parity data within the memory die, a memory bank protection scheme solves the performance degradation problem during memory bank failures, achieves efficient error correction and detection, and improves the overall performance of the memory system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MICRON TECHNOLOGY INC
- Filing Date
- 2022-05-18
- Publication Date
- 2026-06-16
AI Technical Summary
In existing memory systems, when a single memory cell fails, the entire memory die needs to be read, resulting in performance degradation. Furthermore, existing RAID schemes experience increased latency in the event of small-granularity failures.
A memory bank protection scheme is adopted, which stores error correction data in a parity stripe within a single die. Error correction and detection are performed on a single die, avoiding the need to read the entire die when a single memory bank fails.
It improves the overall performance of the memory system, reduces operational overhead, simplifies the read and recovery process in case of failure, and prevents the memory from becoming a single point of failure in the system.
Smart Images

Figure CN119988088B_ABST
Abstract
Description
[0001] Information related to divisional application
[0002] This case is a divisional application. The parent application of this divisional application is the invention patent application filed on May 18, 2022, with application number 202280033754.X and invention title "Storage Protection". Technical Field
[0003] This disclosure generally relates to semiconductor memories and methods, and more particularly to apparatus, systems and methods for protecting memory. Background Technology
[0004] Memory devices are typically provided as internal semiconductor integrated circuits in computers or other electronic systems. Many different types of memory exist, including volatile and non-volatile memory. Volatile memory requires power to maintain its data (e.g., host data, error data, etc.) and includes Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Synchronous Dynamic Random Access Memory (SDRAM), and Thyristor Random Access Memory (TRAM), etc. Non-volatile memory provides persistent data by retaining the stored data when no power is supplied and includes NAND flash memory, NOR flash memory, and resistive variable memory, such as Phase Change Random Access Memory (PCRAM), Resistive Random Access Memory (RRAM), and Magnetoresistive Random Access Memory (MRAM), such as Spin Torque Transfer Random Access Memory (STT RAM), etc.
[0005] A memory device may be coupled to a host computer (e.g., a host computing device) to store data, commands, and / or instructions for use by the host computer or electronic system during operation. For example, data, commands, and / or instructions may be transferred between the host computer and the memory device during the operation of a computing or other electronic system. Summary of the Invention
[0006] One aspect of this application relates to an apparatus comprising: a first group of storage units of a memory device, the first group of storage units comprising: a first portion configured to store host data; and a second portion configured to store error detection data for indicating the number of errors exceeding a threshold number in a corresponding portion of the first group of storage units; and a second group of storage units of the memory device, the second group of storage units configured to store error correction data to correct the number of errors exceeding the threshold number in a corresponding portion of the first group of storage units.
[0007] Another aspect of this application relates to an apparatus comprising: a group of storage units configured to store data corresponding to a first stripe, wherein the first stripe further comprises: a first portion of the group of storage units configured to store first host data; a second portion of the group of storage units configured to store first error correction data to correct for the number of errors within the first host data; and the group of storage units further configured to store data corresponding to a second stripe, wherein the second stripe further comprises: a third portion of the group of storage units configured to store second host data; and a fourth portion of the group of storage units configured to store second error correction data.
[0008] Another aspect of this application relates to a method comprising: performing an error detection operation on first host data retrieved from a first storage in a group of storage using first error detection data; and, in response to the error detection operation indicating a number of errors in the first host data that exceed a threshold number, performing an error correction operation using first error correction data to correct the number of errors exceeding the threshold number; and second host data retrieved from one or more storages in the group of storage different from the first storage. Attached Figure Description
[0009] Figure 1 This is a block diagram of a device in the form of a computing system including a host and a memory device, according to several embodiments of the present disclosure.
[0010] Figure 2 This describes an instance memory die comprising a storage medium configured to store error correction / detection data, according to several embodiments of the present disclosure.
[0011] Figure 3 Examples of how error correction / detection data can be distributed in storage according to several embodiments of this disclosure are described.
[0012] Figure 4 Examples of how error correction / detection data can be distributed in storage according to several embodiments of this disclosure are described.
[0013] Figure 5 This describes an example memory protection scheme according to several embodiments of the present disclosure, in which error correction / detection data spans multiple memory dies.
[0014] Figure 6 This is a flowchart illustrating an example method for memory protection according to several embodiments of the present disclosure. Detailed Implementation
[0015] Describe systems, devices, and methods related to memory protection. Data protection and recovery schemes are typically an important aspect of RAS (Reliability, Availability, and Serviceability) associated with memory systems. Such schemes provide "chip-hunting" capabilities, where the memory system can continue to function even if the constituent chips (e.g., memory dies) are damaged; thereby preventing a single point of failure (SPOF) from occurring in the memory system. Chip-hunting capabilities are often provided through "Redundant Array of Independent Disks (RAID)" schemes, which allow data recovery from a damaged chip by reading all constituent chips of the memory system.
[0016] This RAID scheme provides chip kill capability; however, it can introduce unnecessary latency when implemented in memory systems where failures frequently occur at specific memory locations with granularity smaller than the die level. For example, a memory die (e.g., a DRAM die) comprising multiple banks of memory cells typically experiences a failure within a single constituent bank. Therefore, chip kill capability that prevents a single memory die from becoming a SPOF treats a failure of a single bank as a failure of the die, triggering reads of multiple dies each time a single bank fails.
[0017] In contrast, the embodiments described herein relate to providing a memory bank protection scheme that prevents each constituent memory bank from becoming a SPOF (Single Point of Failure) on a memory die. Therefore, the memory bank protection scheme provided by the embodiments of this disclosure avoids instances where data recovery of a failed memory bank requires reading all memory dies in the memory system unless a particular die is completely damaged (e.g., nonfunctional), which improves the overall performance of the memory system. Compared to previous RAS (Recovery and Save) schemes, the various embodiments of this disclosure can provide benefits such as reduced overhead because a single memory bank can be used for parity data (e.g., RAID parity), rather than an entire die, as in some previous “chip-hunting” methods. Moreover, in some embodiments, the parity stripe used to protect the memory bank is within a single die; therefore, operations related to the parity scheme (e.g., read, write, and recovery in failure conditions) involve a single die on a single channel, which provides simpler management than RAID recovery schemes that operate across multiple dies and / or channels. Furthermore, because the various embodiments involve a single die, memory bank recovery mechanisms (e.g., CRC+RAID) can be implemented on the die rather than via an off-die controller.
[0018] In the following detailed description of this disclosure, reference is made to the accompanying drawings, which form a part of this disclosure and illustrate by way of description how one or more embodiments of this disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice embodiments of this disclosure, and it should be understood that other embodiments may be utilized and process, electrical, and structural changes may be made without departing from the scope of this disclosure.
[0019] As used herein, particularly with respect to element symbols in the drawings, identifiers such as “N”, “M”, etc., indicate that the number of the specific features indicated may be included. It should also be understood that the terminology used herein is for describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms “a” and “described” may include both singular and plural references, unless the context clearly indicates otherwise. Additionally, “a number,” “at least one,” and “one or more” (e.g., a number of memory cells) may refer to one or more memory cells, while “more” is intended to refer to more than one such thing.
[0020] Furthermore, the word "may" is used throughout this application to mean permitted (e.g., possible, able) rather than mandatory (e.g., required). The term "comprising" and its derivatives mean "including (but not limited to)". Depending on the context, the terms "coupled" and "coupled" mean physically connected, directly or indirectly, or used for accessing and moving (transmitting) commands and / or data.
[0021] The figures in this document follow a numbering convention, where the first few digits correspond to the figure number and the remaining digits identify the elements or components within the figure. Similar elements or components between different figures can be identified by using similar numbers. For example, 221 could refer to... Figure 2 Component "21" in the text, and similar components in Figure 3 The reference numeral 221 may be referred to as 321. A group or number of similar elements or components may be collectively referred to herein by a single element symbol. For example, multiple reference elements 221-1 to 221-M may be collectively referred to as 221. It should be understood that the elements shown in the various embodiments herein may be added, interchanged, and / or eliminated to provide several additional embodiments of this disclosure. In addition, the scale and / or relative dimensions of the elements provided in the figures are intended to illustrate certain embodiments of this disclosure and should not be considered as intended to be limiting.
[0022] Figure 1 This is a block diagram of a device in the form of a computing system 100 including a system controller 110 and a memory device 120, according to several embodiments of the present disclosure. As used herein, the memory device 120, the control circuitry system 140, the memory banks 121-1, 121-2, ..., 121-N and / or the memory array 130 may also be individually considered as a “device”.
[0023] System controller 110 is coupled (e.g., connected) to memory device 120. System controller 110 may be an external controller, such as a memory controller for a memory subsystem (e.g., a dual in-line memory module (DIMM) or a solid-state drive (SSD)). In embodiments where system controller 110 is a memory controller for a memory subsystem, memory controller 110 may be coupled to one or more processors (e.g., CPUs).
[0024] System controller 110 may include a logic circuit system (e.g., logic 160) for generating ECC data based on data received from the host. Logic circuit system 160 can operate based on various types of error correction / detection data, such as Hamming codes, Reed-Solomon (RS) codes, Bosch-Joherry-Hockwinheim (BCH) codes, Cyclic Redundancy Check (CRC) codes, Gray codes, Reed-Muller codes, Gopa codes, and Denniston codes, among others. Error correction / detection data generated using error correction / detection component 105 can be written to multiple dies (e.g., memory die 120), such as in combination. Figure 5 Further description / explanation.
[0025] In various embodiments, the system controller 110 may be further coupled to the host system ( Figure 1 (Not specified), such as personal laptops, desktop computers, digital cameras, smartphones, memory card readers and / or IoT-enabled devices, and various other types of host systems. The host system may include a system motherboard and / or backplane and may include several processing resources (e.g., one or more processors, microprocessors, or other types of control circuitry). System 100 may include a separate integrated circuit or host system, with system controller 110 and memory device 120 on the same integrated circuit. System 100 may be, for example, a server system and / or a high-performance computing (HPC) system and / or a portion thereof.
[0026] Memory device 120 (e.g., memory die) may include a plurality of memory banks 121-1, 121-2, ..., 121-N (e.g., collectively referred to as memory banks 121), which may include memory array 130 (including multiple rows and columns of memory cells) and sensing circuitry 150. Although Figure 1 A single memory device 120 is shown, but the system controller 110 can be coupled to multiple memory devices (e.g., dies) 120 via multiple channels. Furthermore, although... Figure 1 Not specified, but each of the memory banks 121 may include a control circuitry (e.g., a memory bank processor) to control and / or orchestrate the execution of memory operations in response to instructions received from the control circuitry 140. In some embodiments, each of the memory banks 121 may be individually addressed, for example, by the control circuitry 140.
[0027] For clarity, system 100 has been simplified to focus on features particularly relevant to this disclosure. Memory array 130 may be a DRAM array, SRAM array, STT RAM array, PCRAM array, TRAM array, RRAM array, NAND flash array, and / or NOR flash array, as well as other types of arrays. Array 130 may comprise memory cells arranged in rows coupled by access lines (which may be referred to herein as word lines or select lines) and columns coupled by sense lines (which may be referred to herein as data lines or digital lines).
[0028] like Figure 1 As shown, memory device 120 may include address circuitry 142 to latch address signals provided by I / O circuitry 144 via a combined data / address bus 156 (e.g., an external I / O bus connected to system controller 110), whereby I / O circuitry 144 may include an internal I / O bus. For example, the internal I / O bus may transfer data between memory banks and I / O pins (e.g., DRAM DQ).
[0029] Memory device 120 may include an address circuitry 142 to latch address signals of data provided via I / O circuitry 144 through input / output “I / O” circuitry 156 (e.g., data bus and / or address bus) to an external ALU circuitry and DRAM DQ via local and global I / O lines. The address signals are received by the address circuitry 142 and decoded by row decoder 146 and column decoder 152 to access memory array 130. Data can be read from memory array 130 by sensing voltage and / or current changes on sensing lines (digital lines) using sensing circuitry 150. Sensing circuitry 150 can read and latch a page (e.g., a row) of data from memory array 130. I / O circuitry 144 can be used for bidirectional data communication with system controller 110 via data bus 156 (e.g., a 64-bit wide data bus). Write circuitry 148 can be used to write data to memory array 130.
[0030] Control circuitry system 140 (e.g., memory control logic and sequence generator) can decode signals (e.g., commands) provided by control bus 154 from system controller 110. These signals may include chip enable signals, write enable signals, and / or address latch signals, which can be used to control operations performed on memory array 130, including data sensing, data storage, data movement (e.g., copying, transferring, and / or transmitting data values), data writing and / or data erasure operations, and other operations. In various embodiments, control circuitry system 140 may be responsible for executing instructions from system controller 110 and accessing memory array 130. Control circuitry system 140 may be a state machine, sequence generator, or some other type of controller.
[0031] The control circuitry system 140 may further include an error correction / detection component 105 and utilize the error correction / detection component 105 to generate ECC data based on data received from the host and / or system controller 110. The error correction / detection component 105 can operate based on various types of error correction / detection data, such as Hamming codes, Reed-Solomon (RS) codes, Bosch-Joherry-Hockwinheim (BCH) codes, Cyclic Redundancy Check (CRC) codes, Gray codes, Reed-Muller codes, Gopa codes, and Denniston codes, among others. The error correction / detection data generated using the error correction / detection component 105 can be written to the memory 121 in various ways, such as in combination with... Figures 2 to 6 Further description / explanation.
[0032] The error correction / detection component 105 of the control circuitry system 140 can be configured to perform error correction / detection operations using error correction / detection data stored in the memory array 130. The error correction / detection operations performed using the error correction / detection component 105 can provide multi-level error correction / detection capabilities for errors within the memory array 130. For example, a first level of error correction / detection capability (in multiple levels) can be provided using error correction data stored in a memory bank (e.g., memory bank 121) to correct the number of errors equal to or less than a threshold number and using error detection data stored in the same memory bank to indicate whether there are still residual errors in the same memory bank, even after a previously performed error correction operation (e.g., a first error correction operation). If it is indicated that there are still errors even after a previously performed error correction operation, a second level of error correction / detection capability (in multiple levels) can be provided. The second level of error correction / detection capability can be provided by performing another error correction operation (e.g., a second error correction operation) using error correction data stored in a dedicated memory bank and / or portions of the memory bank.
[0033] In some embodiments, a first error correction operation (e.g., performed to provide a first level of error correction capability) and a second error correction operation (e.g., performed to provide a second level of error correction capability) may be performed at different processing resources. For example, the first error correction operation may be performed at a corresponding memory processor, while the second error correction operation may be performed at a system processor / controller, such as control circuitry 140.
[0034] Figure 2 This description describes an example memory die 220 according to several embodiments of the present disclosure, including memory banks 221-1, ..., 221-M configured for storing error correction / detection data. The memory bank 221 is similar to... Figure 1 Description / explanation of memory bank 121. Memory bank 221 may be those memory banks coupled to the same channel and contained within a single memory die (e.g., memory die 220). Memory die 220 may be a DRAM die and memory bank 221 may contain DRAM cells.
[0035] Storage units 221-1, ..., 221-(M-1) can be configured to store host data (e.g., from...) Figure 1 The data received by the system controller 110 described herein is stored in corresponding locations, such as in corresponding portions 221-1-1, ..., 221-(M-1)-1. In some embodiments, the corresponding portions 221-1-1, ..., 221-(M-1)-1 of the storage banks 221-1, ..., 221-(M-1) may further include error correction data for correcting the number of errors in the corresponding portions of the storage banks 221-1, ..., 221-(M-1) that are equal to or do not exceed a threshold number. For example, the number of errors in the storage bank 221-1 that are equal to or do not exceed a threshold number can be corrected using the error correction data stored in portion 221-1-1.
[0036] Storage bank 221 may be configured to store error detection data in its respective portions 221-1-2, ..., 221-M-2. While embodiments are not limited thereto, the error detection data may be CRC data. Error detection data (e.g., CRC) can be used to indicate the number of errors exceeding a threshold number within a corresponding page (e.g., a row of memory cells) of storage bank 221. For example, an error detection operation performed using error detection data stored in portion 221-1-2 may indicate whether an error still exists in a page of storage bank 221-1 even after an error correction operation performed using error correction data stored in portion 221-1-1. The error detection operation indicates that an error still exists in storage bank 221-1, further indicating that storage bank 221-1 initially contained a number of errors exceeding a threshold number within storage bank 221-1.
[0037] Storage bank 221-M can be configured to store error correction data in its portion 221-M-1 for correcting the number of errors exceeding a threshold that cannot be corrected using error correction data stored in the corresponding portions 221-1-1, ..., 221-(M-1)-1 (e.g., within the corresponding portions of storage banks 221-1, ..., 221-(M-1)). In some embodiments, the error correction data used to correct the number of errors exceeding the threshold may be parity data (e.g., RAID parity).
[0038] Performing an error correction operation using error correction data (e.g., parity data) that can be stored in storage bank 221-M involves reading the error correction data from storage bank 221-M as well as other host data stored in storage banks 221-1, ..., 221-(M-1) (e.g., in addition to host data stored in a specific storage bank indicated to have the stated number or number of errors). For example, if it is determined that the host data read from storage bank 221-1 (e.g., data stored in a page of the corresponding storage bank) still contains errors that the error correction operation performed using the error correction data in a portion 221-1-1 of storage bank 221-1 cannot correct, then the data stored in storage bank 221-1 can be recovered by reading the error correction data stored in storage bank 221-M as well as other corresponding host data from storage banks 221-2, ..., 221-(M-1). For example, the XOR of data read from “good” memory bank 221-2, ..., 221-(M-1) with error correction data read from memory bank 221-M can be used to correct (e.g., recover) erroneous data read from memory bank 221-1.
[0039] Figure 2 This describes a stripe 222 (e.g., a parity stripe) spanning memory banks 221-1, ..., 221-M. For example, stripe 222 may contain / correspond to data stored in one or more rows of memory cells in each of memory banks 221-1, ..., 221-M. One or more rows within each of memory banks corresponding to the same stripe (e.g., stripe 222) may be referred to as a "strip". Error correction operations described herein (e.g., performed using parity data) may be performed on a stripe basis (e.g., stripe 222). For example, error correction data stored in memory banks 221-M and corresponding to a particular stripe (e.g., stripe 222) corresponds to error correction data previously generated based on host data corresponding to the same stripe. Therefore, an error correction operation for correcting an error number exceeding a threshold number involves reading the error correction data of the stripe and the host data (good data) of the stripe. In some embodiments, a stripe may correspond to data in a single row (e.g., a single DRAM page) of memory cell in the corresponding memory bank.
[0040] In some embodiments, memory bank 221 may be a DRAM memory bank and include DRAM cells. In this example, compared to NAND where erase operations are performed on a block-by-block basis (while write operations are performed on a page-by-page basis), read and write operations can be performed on memory bank 221 independently of erase operations. For example, in a NAND memory device, a block typically stores data pages corresponding to multiple stripes. Therefore, even updating data corresponding to one of the stripes and stored in a single page of the block requires erasing all pages of the block, which further necessitates rewriting data corresponding to other stripes of the block. On the other hand, updating the host data (corresponding to one of the stripes) according to a DRAM memory bank (e.g., memory bank 221) does not require rewriting host data corresponding to other stripes of the strip and / or other stripes.
[0041] Assuming that memory die 220 contains 64 memory banks (e.g., memory bank 221), Figure 2 The example described may have a parity check overhead of approximately 1.58% (1 / 63). For example, the ratio of the number of storage units (e.g., storage unit 221-M) configured to store error correction data for correcting errors exceeding a threshold number to the number of storage units (e.g., 221-1, ..., 221-(M-1)) configured to store host data is 1 / 63.
[0042] In some embodiments, error detection data may be stored in a single-row (e.g., a single DRAM page) memory cell (corresponding to portions 221-1-2, ..., 221-M-2). In this example, error detection data stored in a single-row memory cell (which may be a unit of a single read operation of a DRAM array) can be used to perform an error detection operation on data corresponding to any of the multiple stripes (corresponding to portions 221-1-1, ..., 221-M-1). Therefore, compared to methods where the corresponding error correction / detection data of multiple stripes are stored in separate locations (which requires performing multiple read operations to read error correction / detection data from each location), the combination of... Figure 2 (and combination) Figures 3 to 6 The embodiment described herein involves reading data corresponding to multiple stripes from the same storage unit in no more than once (a single read operation can be performed to read error detection data used for multiple read operations performed to read data corresponding to multiple stripes).
[0043] While the embodiments are not limited to this, the number of thresholds described herein may correspond to a single error. For example, in Figure 2In the embodiments described herein, error correction data stored in portions 221-1-1, ..., 221-M-1 can correct a single error, while error correction data stored in storage 221-M can correct multiple errors (e.g., more than a single error) in one or more storage 221-1, ..., 221-(M-1).
[0044] The operational roles of storage units (e.g., storage unit 221) can be occasionally / periodically swapped to balance the number of accesses across storage units, avoid "hot spots," and prevent one storage unit from being accessed more frequently than others. For example, as previously described, storage units 221-1, ..., 221-(M-1) are configured to store host data, while storage unit 221-M is configured to store error correction data (e.g., parity data). To prevent storage units 221-1, ..., 221-(M-1) from being accessed more frequently than storage unit 221-M (because host data is likely to be accessed more frequently than multichannel error correction data), at any given time, one of storage units 221-1, ..., 221-(M-1) can be reconfigured to store error correction data, while storage unit 221-M can be reconfigured to store host data.
[0045] In a non-restricted instance, the instance device (e.g.) Figure 1 The computing system 100 or memory device 120 described herein may include a first group of memory banks (e.g., memory banks 221-1, ..., 221-(M-1)) of memory dies (e.g., memory die 220). The first group of memory banks may include a first portion (e.g., portions 221-1-1, ..., 221-(M-1)-1) configured for host data and a second portion (e.g., portions 221-1-2, ..., 221-M-2) configured for indicating the number of errors exceeding a threshold number within the corresponding memory banks of the first group. The device may further include a second group of memory banks (e.g., memory banks 221-M) configured for error correction data to correct the number of errors exceeding a threshold number within the corresponding memory banks of the first group. In some embodiments, the error detection data includes cyclic redundancy check (CRC) data. Furthermore, the first group and the second group of memory banks may include dynamic random access memory (DRAM) cells.
[0046] In some embodiments, the memory die may comprise multiple groups of stripes (e.g., stripe 222). Each of the multiple groups of stripes may comprise a corresponding row group of memory cells from each of the first and second groups of memory cells. In this example, the device may further include control circuitry coupled to the first group of memory cells and the second group of memory cells of the memory die (e.g., Figure 1The control circuitry system 140 described herein may be configured to read host data from a first portion of one of a plurality of stripes in a first group of storage units and to perform an error detection operation on the read host data using error detection data stored in a second portion of one of the plurality of stripes in the first group of storage units. The control circuitry system may be further configured to perform an error correction operation using at least error correction data stored in a second group of storage units corresponding to one of the plurality of stripes in response to an error detection operation indicating that the number of errors in one of the first group of storage units exceeds a threshold number.
[0047] Continuing with the above example, the control circuit system may be further configured to perform an error correction operation on one of the first group of storage units using error correction data in a first portion of the first group of storage units before performing an error detection operation on a portion of the host data using error detection data in one of the first group of storage units to correct the number of errors in one of the first group of storage units that are equal to or do not exceed a threshold number.
[0048] Figure 3 Examples of how error correction / detection data can be distributed in storage units 321-1, ..., 321-8 according to several embodiments of the present disclosure are described. Figure 3 The storage unit 321 described in the text is similar to Figure 1 and 2 The memory banks 121 and 221 are described separately. For example, memory bank 321 may be those memory banks coupled to the same channel and contain DRAM cells. Although Figure 3 Not specified, but each storage unit 321 may further store error detection data (e.g., CRC) to indicate that the number of errors exceeds a threshold number, such as a single error. Although Figure 3 The example describes eight memory banks, but the embodiment is not limited to the specific number of memory banks that a single memory die can contain.
[0049] like Figure 3 The description states that memory bank 321 can be further divided into regions. As used herein, the term "region" refers to a group of rows of memory cells spanning multiple memory banks. For example, as... Figure 3The description states that the row group of memory cells 325-1-1, ..., 325-8-1 from memory banks 321-1, ..., 321-8 can be called zone 323-1; the row group of memory cells 325-1-2, ..., 325-8-2 from memory banks 321-1, ..., 321-8 can be called zone 323-2; the row group of memory cells 325-1-3, ..., 325-8-3 from memory banks 321-1, ..., 321-8 can be called zone 323-3; and the row group of memory cells 325-1-4, ..., 325-8-4 from memory banks 321-1, ..., 321-8 can be called zone 323-4. 4; The row group of memory cells 325-1-5, ..., 325-8-5 from memory banks 321-1, ..., 321-8 can be called area 323-5; The row group of memory cells 325-1-6, ..., 325-8-6 from memory banks 321-1, ..., 321-8 can be called area 323-6; The row group of memory cells 325-1-7, ..., 325-8-7 from memory banks 321-1, ..., 321-8 can be called area 323-7; And the row group of memory cells 325-1-8, ..., 325-8-8 from memory banks 321-1, ..., 321-8 can be called area 323-8.
[0050] Error correction data, used to correct errors exceeding a threshold, is uniformly distributed across storage 321, such that each area 323 can store error correction data in only one of the storage units 321. For example, such as Figure 3 The description states that region 323-1 stores error correction data in memory bank 321-8 (e.g., row group of memory cell 325-1-8); region 323-2 stores error correction data in memory bank 321-7 (e.g., row group of memory cell 325-2-7); region 323-3 stores error correction data in memory bank 321-6 (e.g., row group of memory cell 325-3-6); region 323-4 stores error correction data in memory bank 321-5 (e.g., row group of memory cell 325-4-5); region 323-5 stores error correction data in memory bank 321-4 (e.g., row group of memory cell 325-5-4); region 323-6 stores error correction data in memory bank 321-3 (e.g., row group of memory cell 325-6-3); and region 323-7 stores error correction data in memory bank 321-2. (e.g., the row group of memory cell 325-7-2); and region 323-8 stores error correction data in memory bank 321-1 (e.g., the row group of memory cell 325-8-1). Figure 2Compared to instances where error correction data is stored in only one of the memory banks 221 (e.g., memory bank 221-M), uniformly distributing error correction data across memory banks 321 can balance the number of accesses across memory banks 321 to avoid "hot spots" by preventing one memory bank from being accessed more frequently than others.
[0051] Error correction operations used to correct a number of errors exceeding a threshold (e.g., a single error) can be performed in stripes using error correction data (e.g., parity data) stored in one of the memory banks 321 in each region 323, such as in combination. Figure 3 Description. For example, if it is determined that host data read from a row of memory cells in group 325-1-1 of memory bank 321-1, corresponding to a specific stripe, contains an error count exceeding a threshold number (which cannot be corrected by a previously performed error correction operation to correct an error count equal to or less than the threshold number), then the data can be recovered by reading the error-corrected data from the corresponding row (corresponding to the same stripe as the memory cell row in group 325-1-1) of memory cell group 325-8-1 of memory bank 321-8, and other host data from the corresponding rows (corresponding to the same stripe as the memory cell row in group 325-1-1) of memory cell rows in memory cell groups 325-2-1, ..., 325-7-1 of memory banks 321-2, ..., 321-7. Although embodiments are not limited thereto, the threshold number described herein may correspond to a single error.
[0052] Also, as in combination Figure 2 The operational roles of memory banks (e.g., memory bank 321) for a region (e.g., region 323) can be occasionally / periodically swapped to balance the number of accesses across memory banks and avoid "hot spots" and prevent one memory location (e.g., region) from being accessed more frequently than other memory locations (e.g., regions). For example, one of the row groups of memory cells in a memory bank configured to store error correction data for a specific region can be reconfigured to store host data, while row groups of memory cells in different memory banks (which are configured to store host data) can be reconfigured to store error correction data for a specific region (to correct the number of errors exceeding a threshold).
[0053] Figure 4 Examples of how error correction / detection data can be distributed in memory bank 421 of memory die 420 having error correction / detection data for sub-region 427, according to several embodiments of the present disclosure, are described. Figure 4 The memory bank 421 and region 423 described herein are respectively similar to Figure 2The memory bank 421 and region 323 are described separately. For example, memory bank 421 may be those memory banks coupled to the same channel and contain DRAM cells. Although Figure 4 Not specified, but each storage bank 421 may further store error detection data (e.g., CRC) to indicate the number of errors exceeding a threshold (e.g., a single error) within the corresponding storage bank 421. Although Figure 4 The description includes 8 memory banks and regions, but the embodiments are not limited to the specific number of memory banks / regions that a single memory die can contain.
[0054] like Figure 4 The text explains that region 423 can be further divided into sub-regions. For example, such as... Figure 4 The description states that one or more rows of memory cells from each of the memory banks 421 and located in region 423-1 can constitute the corresponding subregions 427-1-1, ..., 427-1-8; one or more rows of memory cells from each of the memory banks 421 and located in region 423-2 can constitute the corresponding subregions 427-2-1, ..., 427-2-8; one or more rows of memory cells from each of the memory banks 421 and located in region 423-3 can constitute the corresponding subregions 427-3-1, ..., 427-3-8; one or more rows of memory cells from each of the memory banks 421 and located in region 423-4 can constitute the corresponding subregions 427-4-1, ..., 427-4-8. Each of the elements in memory bank 421 and one or more rows of memory cells in region 423-5 may constitute the corresponding element in subregions 427-5-1, ..., 427-5-8; each of the elements in memory bank 421 and one or more rows of memory cells in region 423-6 may constitute the corresponding element in subregions 427-6-1, ..., 427-6-8; each of the elements in memory bank 421 and one or more rows of memory cells in region 423-7 may constitute the corresponding element in subregions 427-7-1, ..., 427-7-8; and each of the elements in memory bank 421 and one or more rows of memory cells in region 423-8 may constitute the corresponding element in subregions 427-8-1, ..., 427-8-8.
[0055] Error correction data (e.g., parity data) used to correct errors exceeding a threshold can be evenly distributed across memory bank 421 and region 423, such that each sub-region 427 can contain error correction data from only one of memory banks 421. For example, such as Figure 4The description states that sub-region 427-1-1 stores error correction data in memory bank 421-8; sub-region 427-1-2 stores error correction data in memory bank 421-7; sub-region 427-1-3 stores error correction data in memory bank 421-6; sub-region 427-1-4 stores error correction data in memory bank 421-5; sub-region 427-1-5 stores error correction data in memory bank 421-4; sub-region 427-1-6 stores error correction data in memory bank 421-3; sub-region 427-1-7 stores error correction data in memory bank 421-2; and sub-region 427-1-8 stores error correction data in memory bank 421-1.
[0056] Error correction operations used to correct a number of errors exceeding a threshold (e.g., a single error) can be performed in stripes using error correction data (e.g., parity data) stored in one of the memory banks 421 in each sub-region 427, such as in combination. Figure 2 Description. For example, if it is determined that host data read from a row of memory cells in subregion 427-1-1 of memory bank 421-1, corresponding to a specific stripe, contains an error count exceeding a threshold number (which cannot be corrected by a previously performed error correction operation to correct an error count equal to or less than the threshold number), then data in the rows of memory cells in the row group of memory cells 427-1-1 can be recovered by reading the error-corrected data in a row of memory cells in subregion 427-1-8 of memory bank 421-8, and other host data corresponding to the same stripe from the corresponding rows of memory cells in subregions 427-1-2, ..., 427-1-7 of memory banks 421-2-, ..., 427-1-7. Although embodiments are not limited thereto, the threshold number described herein may correspond to a single error.
[0057] Also, as in combination Figure 2 and 3 The operational roles of memory banks (e.g., memory bank 421) relating to a sub-region (e.g., sub-region 427) can be occasionally / periodically swapped to balance the number of accesses across memory banks and avoid "hot spots" and prevent one memory location (e.g., sub-region) from being accessed more frequently than other memory locations. For example, one of the row groups of memory cells in a memory bank configured to store error correction data for a specific sub-region can be reconfigured to store host data, while row groups of memory cells in different memory banks (which are configured to store host data) can be reconfigured to store error correction data for a specific sub-region.
[0058] In the unrestricted instance, the instance system (e.g.) Figure 1 The computing system 100 or memory device 120 described herein may include a memory die (e.g., Figure 3and 4 A group of memory banks (e.g., memory dies 320 / 420 as described in the text) Figure 3 and 4 (Referring to memory banks 321 / 421). Each memory bank of the memory die may include a first portion configured to store error correction data for correcting a number of errors exceeding a threshold number within the corresponding memory bank, and a second portion configured to store error detection data for indicating a number of errors exceeding the threshold number within the corresponding memory bank group. The memory bank group may be coupled to the same channel.
[0059] In some embodiments, the memory die is a DRAM die. In this example, the second portion corresponds to a row of DRAM memory cells of the corresponding entity in the memory bank group. In some embodiments, each entity in the memory bank group of the memory die may further include a third portion configured to store host data for generating error correction data stored in the first portion of the memory bank group.
[0060] In some embodiments, error correction data may comprise multiple portions of error correction data uniformly distributed across a group of memory cells, such that each portion of the error correction data is stored in a memory cell row (e.g., memory cell row groups 325-8-1, 325-7-2, 325-6-3, 325-5-4, 325-4-5, 325-3-6, 325-2-7, and 325-1-8) of a different memory cell group than the other portions of the error correction data. In some embodiments, the error correction data includes parity data.
[0061] In some embodiments, the memory die comprises multiple groups of stripes (e.g. Figure 2 and 5 (Stripes 222 / 522 are described separately). Each of the multiple groups of stripes may contain a corresponding row group of memory cells from each of the first and second groups of memory. Furthermore, the multiple groups of stripes may store corresponding portions of error correction data in different memory locations corresponding to the first and second groups of memory.
[0062] In some embodiments, the system may further include a control circuitry system coupled to the memory bank group (e.g., Figure 1The control circuit system 140 described herein may be configured to perform an error detection operation on one of the memory banks using error detection data stored in one of the memory banks to indicate the number of errors exceeding a threshold number within the memory bank group. The control circuit system may be further configured to perform a read operation on the memory bank group in response to the error detection operation indicating that the number of errors in one of the memory banks group exceeds the threshold number to retrieve error correction data stored in a first portion of each of the memory banks group. The control circuit system may be further configured to perform an error correction operation using the error correction data retrieved from the first portion of each of the memory banks group to correct the number of errors exceeding the threshold number within the memory bank group.
[0063] Figure 5 This describes an example memory bank protection scheme according to several embodiments of the present disclosure, in which error correction / detection data spans multiple memory dies 520-1, ..., 520-P. The memory dies 520 are similar to those respectively combined with... Figure 2 , 3 The memory dies 220, 320, and / or 420 described in section 4. For example, each memory die 520 may be a DRAM die. The memory bank 521 may contain DRAM cells. The memory dies 520 may be coupled to different channels. Furthermore, the memory bank 521 is similar to being respectively coupled to... Figure 2 , 3 And memory banks 221, 321, and / or 421 as described in section 4. Memory bank 521 may contain DRAM cells. Combined with... Figure 5 The described error correction / detection operations can be performed by the system controller, for example, in conjunction with... Figure 1 The system controller 110 is described.
[0064] The memory banks 521-1, ..., 521-Q of memory dies 520-1, ..., 520-(P-1) and the memory banks 521-1, ..., 521-(Q-1) of memory die 520-P can transmit host data (e.g., from...) Figure 1The data received by the host 110 as described herein is stored in the corresponding location, for example, in the corresponding portions 521-1-1, ..., 521-Q-1 of the memory banks 521-1, ..., 521-(P-1) of the memory dies 520-1, ..., 520-(P-1) and / or in portions 521-1-1, ..., 521-(Q-1)-1 of the memory banks 521-1, ..., 521-(Q-1) of the memory dies 520-P. In some embodiments, the corresponding portions 521-1-1, ..., 521-Q of the memory banks 521-1, ..., 521-(P-1) of memory dies 520-1, ..., 520-(P-1) and / or the corresponding portions 521-1-1, ..., 521-(Q-1)-1 of the memory banks 521-1, ..., 521-(Q-1) of memory dies 520-P may further include error correction data for correcting the number of errors in the corresponding portions of the memory banks 521 that are equal to or do not exceed a threshold number.
[0065] Each memory bank 521 may store error detection data in its corresponding portion 521-1-2, ..., 521-Q-2. While embodiments are not limited thereto, the error detection data may include CRC data. The error detection data (e.g., CRC) can be used to indicate that the number of errors within the corresponding portion of memory bank 521 exceeds a threshold number. For example, if an error persists in memory bank 521-1 of memory die 520-1 even after an error correction operation performed using error correction data stored in portion 521-1-1 of memory bank 521-1 of memory die 520-1, then an error detection operation performed using error detection data stored in portion 521-1-2 of memory die 520-1 may indicate that an error still exists within memory bank 521-1 of memory die 520-1. In some embodiments, each portion 521-1-2, ..., 521-Q-2 of each memory die 520-1, ..., 520-P may correspond to a single row of memory cells.
[0066] The memory bank 521-Q of memory die 520-P may include error correction data in its portion 521-Q-1 for correcting the number of errors exceeding a threshold number that cannot be corrected using the error correction data stored in the corresponding portions 521-1-1, ..., 521-Q-1 of memory dies 520-1, ..., 520-(P-1) and / or the corresponding portions 521-1-1, ..., 521-(Q-1)-1 of memory dies 520-P (e.g., within the corresponding portions 521-1, ..., 521-Q of memory dies 520-1, ..., 520-(P-1) and / or the corresponding portions 521-1, ..., 521-(Q-1) of memory dies 520-P). In some embodiments, the error correction data for correcting the number of errors exceeding the threshold number may be parity data. Although embodiments are not limited thereto, the threshold number described herein may correspond to a single error.
[0067] A stripe may contain data stored in one or more rows of memory cells (e.g., stripes) of memory banks 521-1, ..., 521-Q of memory dies 520-1, ..., 520-P. For example, stripe 522 may contain / correspond to host data stored in one or more rows of memory cells of memory banks 521-1, ..., 521-Q of memory dies 520-1, ..., 520-(P-1) and memory banks 521-1, ..., 521-(Q-1) of memory dies 520-P, and error correction data stored in one or more rows of memory cells of memory bank 521-Q of memory bank 520-P, such as... Figure 5 As explained in the text. Therefore, error correction operations can be performed on the host data stored in one of the stripes of stripe 522 (to correct the number of errors exceeding a threshold) by reading the error correction data of stripe 522 (of memory banks 521-Q of memory die 520-P) and the host data of other stripes of stripe 522.
[0068] The operational roles of memory banks / dies (e.g., memory bank 521 and / or memory die 520) can be occasionally / periodically swapped to balance the number of accesses across memory banks and avoid "hot spots" and prevent one memory bank from being accessed more frequently than others. For example, memory bank 521-Q of memory die 520-P, previously described, is configured to store error correction data (e.g., parity data), while other memory banks 521 are configured to store host data. Therefore, at any given time, one of the memory banks 521 of memory dies 520-1, ..., 520-(P-1) and memory banks 521-1, ..., 521-(Q-1) can be reconfigured to store error correction data, while memory bank 521-Q of memory die 520-P can be reconfigured to store host data.
[0069] Assuming that each of the memory dies 520 contains 64 memory banks (e.g., memory bank 521), Figure 5 The example described may have an overhead of approximately 0.048%. For instance, the ratio of the number of storage banks configured to store error correction data for correcting errors exceeding a threshold number (e.g., storage bank 521-Q of memory die 520-P) to the number of storage banks configured to store host data (e.g., storage banks 521-1, ..., 521-Q of memory dies 520-1, ..., 520-(P-1) and storage banks 521-1, ..., 521-(Q-1) of memory die 520-P) is 1 / 2047 (approximately 0.048%).
[0070] In unrestricted instances, instance systems (e.g.) Figure 1 The computing system 100 or memory device 120 described herein may include a first number of memory dies (e.g., memory dies 520-1, ..., 520-(P-1)) in a group of memory dies (e.g., memory dies 520). Each of the first group of memory dies may include multiple memory banks (e.g., memory banks 521-1, ..., 521-Q), and each of the multiple memory banks may include a first portion (e.g., portions 521-1-1, ..., 521-Q-1) configured to store host data and a second portion (e.g., portions 521-1-2, ..., 521-Q-2) configured to store error detection data indicating that the number of errors in the corresponding of the multiple memory banks exceeds a threshold number. The system may further include a second memory die (e.g., memory die 520-P) in the group of memory dies. The second memory die may include multiple memory banks (e.g., memory banks 521-1, ..., 521-Q). One of the multiple memory banks (e.g., memory bank 521-Q) can be configured to store error correction data to correct the number of errors exceeding a threshold number in the multiple memory banks of the first number of memory dies and the second die.
[0071] In some embodiments, other banks of the plurality of banks of the second memory die (e.g., banks 521-1, ..., 521-(Q-1)) may be configured to store host data. In some embodiments, the memory dies in the first plurality of memory dies and the second memory die may be coupled to different channels. In some embodiments, each of the plurality of banks of the second memory die may be configured to store error detection data to indicate that the number of errors in the corresponding bank of the second memory die exceeds a threshold number.
[0072] Figure 6This is a flowchart illustrating an example method 631 for memory protection according to several embodiments of the present disclosure. Method 631 can be executed by processing logic, which may include hardware (e.g., processing device, circuit system, dedicated logic, programmable logic, microcode, device hardware, integrated circuit, etc.), software (e.g., instructions running or executed on the processing device), or a combination thereof. In some embodiments, method 631 is performed by… Figure 1 The control circuitry system 140 and / or system controller 110 described herein are executed. Although shown in a specific sequence or order, the order of processes may be modified unless otherwise specified. Therefore, the illustrated embodiments should be understood as examples only, and the illustrated processes may be executed in different orders, and some processes may be executed in parallel. In addition, one or more processes may be omitted in various embodiments. Therefore, not all processes are required in every embodiment. Other process flows are possible.
[0073] At box 632, method 631 may include combining the memory die (e.g., separately) Figures 2 to 5 A group of memory cells (e.g., combined with memory dies 220, 320, 420 and / or 520 as described) Figures 2 to 5 The memory banks 221, 321, 421 and / or 521 described perform read operations to retrieve first host data of a page memory cell from the memory bank group and error detection data of different memory cell pages from the memory bank group.
[0074] At block 634, method 631 may include performing an error detection operation on first host data retrieved from memory cell pages of the group's memory using error detection data. In some embodiments, method 631 may include performing an error correction operation on the host data retrieved from memory cell pages of the group's memory using error correction data stored in the group's memory to correct for an error number equal to or less than a threshold number before performing the error detection operation on the host data retrieved from memory cell pages of the group's memory.
[0075] At box 636, method 631 may include reading second host data from corresponding memory cell pages of other storage units in the group and error correction data stored in at least one of them in the group, in response to an error detection operation indicating that the number of errors in the first host data exceeds a threshold number; and
[0076] At box 638, method 631 may include performing an error correction operation using second host data retrieved from corresponding memory cell pages of other storage entities and error correction data retrieved from at least one of the groups to correct the number of errors exceeding a threshold.
[0077] In some embodiments, method 631 may include receiving a write request to write different host data to memory cell pages of a group of memory. In this example, method 631 may further include writing different host data to memory cell pages without erasing other memory cell pages of the group of memory.
[0078] In some embodiments, method 631 may include generating error correction data based on different host data and second host data in response to receiving a write request. In this example, method 631 may further include writing the generated error correction data to at least one of the groups.
[0079] Although specific embodiments have been illustrated and described herein, those skilled in the art will understand that arrangements calculated to achieve the same results may be substituted for the specific embodiments shown. This disclosure is intended to cover adaptations or variations of one or more embodiments of this disclosure. It should be understood that the foregoing description is illustrative rather than restrictive. Those skilled in the art will understand, upon review of the foregoing description, combinations of the foregoing embodiments and other embodiments not explicitly described herein. The scope of one or more embodiments of this disclosure includes other applications in which the foregoing structures and processes are used. Therefore, the scope of one or more embodiments of this disclosure should be determined by reference to the appended claims and the full scope of their authorized equivalents.
[0080] In the “Detailed Description”, for the sake of simplicity, some features are grouped together in a single embodiment. The method of this disclosure should not be interpreted as reflecting an intention that the disclosed embodiments of this disclosure must use more features than are expressly stated in each claim. In fact, as reflected in the appended claims, the subject matter of the invention has not all features of a single disclosed embodiment. Therefore, the appended claims are hereby incorporated into the “Detailed Description”, wherein each claim is considered an independent, separate embodiment.
Claims
1. An apparatus comprising: The memory bank of a first group of memory devices, the memory bank of the first group of memory banks comprising: The first part, configured to store host data; and The second part is configured to store error detection data indicating the number of errors exceeding a threshold number within the corresponding units of the storage of the first group; and The memory device has a second group of storage cells configured to store error correction data corresponding to parity data to correct the number of errors in the corresponding cells of the first group of storage cells that exceed the threshold number.
2. The device of claim 1, wherein the memory device comprises stripes of a plurality of groups, each of the stripes of the plurality of groups comprising a corresponding group of memory cells from each of the first and second groups of memory.
3. The device according to claim 2, wherein: The storage in the second group is configured to store error correction data corresponding to the first stripe; and Different storage blocks in the second group are configured to store error correction data corresponding to the second stripe.
4. The device according to claim 2, wherein: The first group of memory cells is distributed on the memory banks of the first group and the second group, and the first group of memory cells is configured to store data corresponding to the first stripe; and The second group of memory cells is distributed on the storage of the first group and the storage of the second group, and the second group of memory cells is configured to store data corresponding to the second stripe.
5. The device of claim 1, wherein the memory device corresponds to a memory die.
6. An apparatus comprising: A group of storage units configured to store data corresponding to a first stripe, wherein the first stripe further includes: The first portion of the group's storage, configured to store first host data; and The second portion of the group's storage is configured to store first error correction data to correct the number of errors within the first host data; and The storage of the group is further configured to store data corresponding to the second stripe, wherein the second stripe further includes: The third portion of the group's storage, configured to store second host data; and The fourth portion of the group's storage is configured to store second error correction data.
7. The device according to claim 6, wherein: The second portion of the storage of the group corresponds to the first storage within the storage of the group; and The fourth portion of the storage of the group corresponds to the second storage within the storage of the group.
8. The device according to claim 6, wherein: The second portion of the storage of the group further includes a plurality of portions distributed on the storage of the group, the plurality of portions being configured to store the first error correction data.
9. The device of claim 8, wherein the plurality of portions respectively correspond to different storage units in the storage units of the group.
10. The device of claim 8, wherein the plurality of portions respectively correspond to different groups of memory cells of the group of memory.
11. The device of claim 10, wherein each of the plurality of portions corresponds to a row of DRAM memory cells.
12. The device of claim 6, further comprising a control circuitry system coupled to the memory bank of the group, the control circuitry system being configured to: Perform error detection operations on the first host data using the first error detection data; and In response to the number of errors indicated by the error detection operation that exceed a threshold number within the first host data, an error correction operation is performed using the following data: The first error correction data used to correct the number of errors exceeding the threshold; and The third host data corresponding to the first stripe.
13. The device of claim 12, wherein the control circuitry is configured to perform a read operation on the first portion of the storage of the group to retrieve the first error detection data.
14. A method comprising: Perform error detection operations on first host data retrieved from a first storage bank in a group of storage banks using the first error detection data; and In response to the number of errors indicated by the error detection operation that exceed a threshold number within the first host data, an error correction operation is performed using the following data: First error correction data used to correct the number of errors exceeding the threshold; and Second host data retrieved from one or more storage units in the group that are different from the first storage unit.
15. The method of claim 14, further comprising, prior to performing the error correction operation, performing a read operation on one or more of the storage banks in the group to: Retrieve the second host data from the second storage within the group's storage; and The first error correction data is retrieved from the third storage within the group's storage.
16. The method of claim 14, further comprising: The second error detection data is used to perform error detection operations on the third host data retrieved from the first storage in a group of storage; and In response to the number of errors indicated by the error detection operation within the third host data that exceed a threshold, an error correction operation is performed using the following data: Second error correction data used to correct the number of errors exceeding the threshold; and Fourth host data retrieved from one or more storage units in the group that are different from the first storage unit.
17. The method of claim 16, further comprising: Before performing the error detection operation on the first host data, a read operation is performed on a first group of memory cells of the first storage to retrieve the first host data; and Before performing the error detection operation on the first host data, a read operation is performed on a second group of memory cells of the first storage bank to retrieve the first host data.
18. The method of claim 17, further comprising, prior to performing the error detection operation on the first host data, performing a read operation on a first group of memory cells of one or more storage banks to retrieve the second host data or the first error correction data.
19. The method of claim 17, further comprising performing a read operation on a second group of memory cells of the one or more storage banks to retrieve the fourth host data or the second error correction data before performing the error detection operation on the third host data.