Semiconductor device

JPWO2024013604A5Pending Publication Date: 2026-06-12

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Filing Date
2023-06-30
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Current semiconductor devices face challenges in reducing power consumption and improving arithmetic performance due to high energy consumption when switching between tasks and accessing external memory for data storage, particularly in CPU and neural network calculations.

Method used

A semiconductor device configuration that includes a combination of silicon and oxide semiconductor transistors in a layered structure, with a register and memory circuit design that allows for efficient data holding and switching using flip-flops and data retention circuits, reducing the need for external memory access and minimizing power consumption.

🎯Benefits of technology

This configuration enables reduced power consumption and enhanced arithmetic performance by allowing efficient task switching and data processing without relying on external memory access, while also increasing memory density and processing capabilities.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

Provided is a semiconductor device with a novel configuration. This semiconductor device has: a first computing device that has registers, and a second computing device that has memory circuits, layer selecting circuits, and a computing circuit. The first computing device and the second computing device are provided to an element layer that is formed by stacking a plurality of second element layers on a first element layer. The registers each have a flip-flop and a data holding circuit. The flip-flops and the computing circuit are provided to the first element layer. The data holding circuits are provided to each layer of the plurality of second element layers on the first element layer provided with the flip-flops. The memory circuits and the layer selecting circuits are provided to each layer of the plurality of second layers on the first element layer provided with the computing circuit.
Need to check novelty before this filing date? Find Prior Art

Description

Semiconductor Devices 【0001】 One embodiment of the present invention relates to a semiconductor device or the like. 【0002】 Note that one embodiment of the present invention is not limited to the above technical field. The technical field of the invention disclosed in this specification and the like relates to an object, a method, or a manufacturing method. Alternatively, one embodiment of the present invention relates to a process, a machine, manufacture, or a composition of matter. Therefore, more specifically, examples of the technical field of one embodiment of the present invention disclosed in this specification include a semiconductor device, a display device, a light-emitting device, a power storage device, a memory device, a driving method thereof, or a manufacturing method thereof. 【0003】 Technological development of a semiconductor device that can hold charge according to data by combining a transistor using an oxide semiconductor for a channel formation region (hereinafter referred to as an OS transistor) and a transistor using silicon for a channel formation region (hereinafter referred to as a Si transistor) is underway. 【0004】 The semiconductor device is configured to save (also referred to as evacuation, storage, or backup) or load (also referred to as restoration, restoration, or recovery) a program or data held in a flip-flop or the like, thereby enabling low power consumption by power gating, etc. Therefore, application of the semiconductor device to semiconductor devices having a CPU (Central Processing Unit) or the like is progressing (see, for example, Patent Document 1). 【0005】 The CPU executes a series of processes (tasks) by sequentially executing processes according to programs or data. 【0006】Data required for processing in the CPU or data obtained by the processing is transmitted and received between the CPU and peripheral circuits. Various peripheral circuits are used according to user needs. Examples of peripheral circuits include a dynamic random access memory (DRAM) interface, a peripheral component interface (PCI), a direct memory access (DMA), a network interface, and an audio interface. 【0007】 When multiple tasks are executed, each task is divided into small processing units, and the processing units of each task are executed sequentially, making it appear as if multiple tasks are being executed simultaneously.To execute this processing, multiple register banks (sets of general-purpose registers) are prepared, and the register banks are switched according to the task to be executed. 【0008】 Also, when a program transitions from a main routine to a subroutine, the register bank is switched before the processing of the subroutine is executed, and after the processing of the subroutine is completed, the register bank is switched back to the original register bank before the processing of the main routine is executed. 【0009】 JP 2013-9297 A 【0010】 In a computing device such as a CPU, when register banks are insufficient to handle complex processing, data in the registers corresponding to the task must be temporarily written to an external memory device, and when the task is executed again, the data must be written back to the registers from the external memory device. In this case, energy is consumed writing and writing back data between the external memory device and the registers. While providing a large number of register banks can reduce energy consumption between the external memory device and the registers, it also increases the circuit layout area. 【0011】Furthermore, in a computing device that performs computations that mimic a neural network, computations are performed using a data set of weight data. If the weight data is stored in an external memory device, the frequency of accessing the external memory device increases when switching between different weight data sets to perform computations. This results in energy consumption for writing and writing back data between the external memory device and the computing circuit. Furthermore, when accessing an external memory device, it becomes difficult to switch weight data in a short time. 【0012】 An object of one embodiment of the present invention is to provide a novel semiconductor device or the like.An object of one embodiment of the present invention is to provide a semiconductor device or the like with a novel structure and excellent low power consumption.An object of one embodiment of the present invention is to provide a semiconductor device or the like with a novel structure and excellent computing performance. 【0013】 The problems of one embodiment of the present invention are not limited to the problems listed above. The problems listed above do not preclude the existence of other problems. The other problems are problems not mentioned in this section, which will be described below. Problems not mentioned in this section can be derived by a person skilled in the art from the description in the specification or drawings, and can be appropriately extracted from these descriptions. One embodiment of the present invention solves at least one of the problems listed above and / or other problems. 【0014】One embodiment of the present invention is a semiconductor device including a first arithmetic unit having a register and a second arithmetic unit having a memory circuit, a layer selection circuit, and an arithmetic circuit, the first arithmetic unit and the second arithmetic unit being provided in an element layer in which a plurality of second element layers are stacked over a first element layer, the first element layer including a first transistor having silicon in a semiconductor layer having a channel formation region, the second element layer including a second transistor having an oxide semiconductor in the semiconductor layer having the channel formation region, the register including a flip-flop and a data retention circuit, the flip-flop and the arithmetic circuit being provided in the first element layer, the data retention circuit being provided in each layer of a plurality of second element layers over the first element layer in which the flip-flop is provided, and the memory circuit and the layer selection circuit being provided in each layer of a plurality of second element layers over the first element layer in which the arithmetic circuit is provided. 【0015】 In one embodiment of the present invention, a semiconductor device is preferred in which input terminals of the flip-flop are electrically connected to respective output terminals of the data retention circuit, output terminals of the flip-flop are electrically connected to respective input terminals of the data retention circuit, and the data retention circuit has a function of retaining data corresponding to a task executed by the first calculation device by making the second transistor non-conductive. 【0016】 In one embodiment of the present invention, the semiconductor device preferably includes a memory circuit having memory cells electrically connected to write word lines and read word lines, and a layer selection circuit having a function of outputting signals to be supplied to the write word lines and read word lines. 【0017】 In one embodiment of the present invention, a semiconductor device is preferred in which the memory circuits provided in different second element layers each have weight data used for arithmetic processing based on a neural network, and the weight data input to the arithmetic circuit is switched by a layer selection circuit. 【0018】 In one embodiment of the present invention, the data retention circuit preferably has a region overlapping with the flip-flop in a plan view. 【0019】In one embodiment of the present invention, the semiconductor device preferably has a region in which the memory circuit overlaps with the arithmetic circuit in a plan view. 【0020】 In one embodiment of the present invention, the oxide semiconductor preferably contains In, Ga, and Zn. 【0021】 In one embodiment of the present invention, the arithmetic circuit is preferably a semiconductor device having a function of performing a product-sum operation. 【0022】 Other aspects of the present invention will be described in the following embodiments and in the drawings. 【0023】 One embodiment of the present invention can provide a novel semiconductor device or the like. Alternatively, one embodiment of the present invention can provide a semiconductor device or the like with a novel structure and excellent low power consumption. Another object of one embodiment of the present invention is to provide a semiconductor device or the like with a novel structure and excellent computing performance. 【0024】 Note that the description of these effects does not preclude the existence of other effects. Note that one embodiment of the present invention does not necessarily have all of these effects. Note that effects other than these will become apparent from the description in the specification, drawings, claims, etc., and it is possible to extract other effects from the description in the specification, drawings, claims, etc. 【0025】FIGS. 1A to 1C are diagrams illustrating a configuration example of a semiconductor device. FIGS. 2A and 2B are diagrams illustrating a configuration example of a semiconductor device. FIGS. 3A and 3B are diagrams illustrating a configuration example of a semiconductor device. FIGS. 4A to 4E are diagrams illustrating a configuration example of a semiconductor device. FIG. 5 is a diagram illustrating a configuration example of a semiconductor device. FIGS. 6A and 6B are diagrams illustrating a configuration example of a semiconductor device. FIGS. 7A and 7B are diagrams illustrating a configuration example of a semiconductor device. FIGS. 8A to 8C are diagrams illustrating a configuration example of a semiconductor device. FIG. 9 is a diagram illustrating a configuration example of a semiconductor device. FIGS. 10A to 10C are diagrams illustrating a configuration example of a semiconductor device. FIG. 11 is a diagram illustrating a configuration example of a semiconductor device. FIG. 12 is a diagram illustrating a configuration example of a semiconductor device. FIGS. 13A to 13C are diagrams illustrating a configuration example of a semiconductor device. FIG. 14 is a diagram illustrating a configuration example of a semiconductor device. FIG. 15 is a diagram illustrating a configuration example of a memory device. FIG. 16A is a diagram illustrating a configuration example of a memory device. FIG. 16B is a diagram illustrating an equivalent circuit of a memory device. FIG. 17 is a diagram illustrating a configuration example of a memory device. FIG. 18A is a diagram illustrating an example of the configuration of a memory device. FIG. 18B is a diagram illustrating an equivalent circuit of a memory device. FIGS. 19A and 19B are diagrams illustrating an example of an electronic component. FIGS. 20A and 20B are diagrams illustrating an example of electronic equipment, and FIGS. 20C to 20E are diagrams illustrating an example of a mainframe computer. FIG. 21 is a diagram illustrating an example of space equipment. FIG. 22 is a diagram illustrating an example of a storage system applicable to a data center. FIG. 23 is a diagram illustrating the configuration of an embodiment. FIG. 24 is a diagram illustrating the configuration of an embodiment. FIG. 25 is a diagram illustrating the configuration of an embodiment. FIG. 26 is a diagram illustrating the configuration of an embodiment. FIG. 27 is a diagram illustrating the configuration of an embodiment. FIG. 28 is a diagram illustrating the configuration of an embodiment. FIG. 29 is a diagram illustrating the configuration of an embodiment. FIG. 30 is a diagram illustrating the configuration of an embodiment. FIG. 31 is a diagram illustrating the configuration of an embodiment. FIG. 32 is a diagram illustrating the configuration of an embodiment. FIGS. 33A and 33B are diagrams illustrating the configuration of an embodiment. FIG. 34 is a diagram illustrating the configuration of an embodiment. FIG. 35 is a diagram illustrating the configuration of an embodiment.36A and 36B are diagrams for explaining the configuration of an embodiment. FIGS. 37A to 37C are diagrams for explaining the configuration of an embodiment. FIGS. 38A to 38C are diagrams for explaining the configuration of an embodiment. FIG. 39 is a diagram for explaining the configuration of an embodiment. FIG. 40 is a diagram for explaining the configuration of an embodiment. 【0026】 Hereinafter, embodiments will be described with reference to the drawings. However, it will be readily understood by those skilled in the art that the embodiments can be implemented in many different forms and that various changes in form and details can be made without departing from the spirit and scope of the present invention. Therefore, the present invention should not be interpreted as being limited to the following description of the embodiments. 【0027】 In addition, in the drawings, the size, layer thickness, or area may be exaggerated for clarity, and therefore, are not necessarily limited to the scale. Note that the drawings are schematic illustrations of ideal examples, and are not limited to the shapes, values, etc. shown in the drawings. 【0028】 In this specification and the like, unless otherwise specified, the off-state current refers to the drain current when a transistor is in an off state (also referred to as a non-conducting state or a cut-off state). Unless otherwise specified, the off-state current refers to the drain current when a transistor is in an off state (also referred to as a non-conducting state or a cut-off state). gs is the threshold voltage V th (For p-channel transistors, V th This refers to a state of being (higher than) 【0029】 In this specification and the like, a metal oxide refers to an oxide of a metal in a broad sense. Metal oxides are classified into oxide insulators, oxide conductors (including transparent oxide conductors), oxide semiconductors (also referred to as oxide semiconductors or simply as OSs), and the like. For example, when a metal oxide is used in an active layer of a transistor, the metal oxide may be referred to as an oxide semiconductor. In other words, an OS transistor can be rephrased as a transistor including a metal oxide or an oxide semiconductor. 【0030】Embodiment Mode 1 In this embodiment mode, a configuration example of a semiconductor device will be described. 【0031】 <Configuration Example of Semiconductor Device 10> A semiconductor device described in one embodiment of the present invention functions as a system on chip (SoC) in which a plurality of arithmetic devices, memory devices, and the like are tightly coupled. 【0032】 Fig. 1A is a block diagram schematically illustrating a semiconductor device 10 for describing one embodiment of the present invention. Fig. 1B is a block diagram more schematically illustrating a top surface of the semiconductor device 10. Fig. 1C is a diagram illustrating an example of a configuration of an element layer that can have each of the configurations illustrated in Figs. 1A and 1B. 【0033】 In this specification and the like, the X direction, the Y direction, and the Z direction may be defined to explain the arrangement of each element. For example, in the schematic diagrams shown in Figures 1A and 1B, the X direction, the Y direction, and the Z direction are defined to explain the arrangement of each element constituting the semiconductor device 10. The X direction, the Y direction, and the Z direction are perpendicular or approximately perpendicular to each other. 【0034】 1A and 1B, the elements constituting the semiconductor device 10 are shown separated from each other to make the arrangement of the elements easier to understand. Elements provided on the same layer are preferably formed in the same process, but this is not a limitation. For example, elements formed in separate processes may be integrated using a bonding technique or the like. 【0035】 A semiconductor device 10 shown in FIGS. 1A and 1B includes an arithmetic unit (also referred to as a first arithmetic unit) 100, an arithmetic unit (also referred to as a second arithmetic unit) 200, a memory device 300, and a peripheral circuit 400. 【0036】 1A and 1B has a configuration in which another element layer (element layer 30) is stacked on an element layer 20. For example, as shown in FIG. 1C , the semiconductor device 10 has a configuration in which an element layer 30 (four element layers 30[1] to 30[4] are illustrated in FIG. 1C ) is stacked on the element layer 20. 【0037】1C, the first element layer 30 is indicated as element layer 30[1], the second element layer 30 is indicated as element layer 30[2], and the third element layer 30 is indicated as element layer 30[3]. Furthermore, the kth element layer 30 (k is an integer of 2 or more) is indicated as element layer 30[k]. In the present embodiment and the like, when describing matters relating to the plurality of element layers 30 as a whole, or when describing matters common to each of the plurality of element layers 30, the term "element layer 30" may be used. The same applies to structures denoted by reference numerals that describe a plurality of structures. 【0038】 Like a CPU, the arithmetic unit 100 has the function of performing general-purpose processing such as running an operating system, controlling data, performing various calculations, and executing programs. The arithmetic unit 100 has a register 110 that has the function of storing data during calculation processing. 【0039】 The arithmetic unit 200 has a plurality of PEs (Processing Elements, units of arithmetic processing, also called arithmetic circuits) and has the function of performing dedicated processing such as image processing or product-sum operations. In addition to the arithmetic circuits (not shown), the arithmetic unit 200 also has a memory circuit 210 that has the function of storing weight data used in the arithmetic processing, and layer selection circuits 220 and 230. 【0040】 As shown in FIG. 1C , the register 110, the memory circuit 210, and the layer selection circuits 220 and 230 have a configuration in which element layers 30[1] to 30[4] each having a transistor 31 are provided on an element layer 20 having a transistor 21. 【0041】 The transistor 21 has silicon in a semiconductor layer 22 having a channel formation region. A transistor having silicon in a semiconductor layer having a channel formation region, like the transistor 21, is called a Si transistor. The transistor 31 has an oxide semiconductor in a semiconductor layer 32 having a channel formation region. A transistor having an oxide semiconductor in a semiconductor layer having a channel formation region, like the transistor 31, is called an OS transistor. 【0042】For the Si transistor, it is preferable to use silicon with high crystallinity, such as single crystal silicon or polycrystalline silicon, since high field effect mobility can be achieved and higher speed operation is possible. 【0043】 Examples of metal oxides applicable to OS transistors include indium oxide, gallium oxide, and zinc oxide. The metal oxide preferably contains two or three elements selected from the group consisting of indium, an element M, and zinc. The element M is one or more elements selected from the group consisting of gallium, aluminum, silicon, boron, yttrium, tin, copper, vanadium, beryllium, titanium, iron, nickel, germanium, zirconium, molybdenum, lanthanum, cerium, neodymium, hafnium, tantalum, tungsten, and magnesium. The element M is preferably one or more elements selected from the group consisting of aluminum, gallium, yttrium, and tin. 【0044】 In particular, it is preferable to use an oxide containing indium (In), gallium (Ga), and zinc (Zn) (also referred to as IGZO) as the metal oxide. Alternatively, it is preferable to use an oxide containing indium, tin, and zinc (also referred to as ITZO). Alternatively, it is preferable to use an oxide containing indium, gallium, tin, and zinc. Alternatively, it is preferable to use an oxide containing indium (In), aluminum (Al), and zinc (Zn) (also referred to as IAZO). Alternatively, it is preferable to use an oxide containing indium (In), aluminum (Al), gallium (Ga), and zinc (Zn) (also referred to as IAGZO). Alternatively, it is preferable to use an oxide containing indium (In), gallium (Ga), zinc (Zn), and tin (Sn) (also referred to as IGZTO). 【0045】The metal oxide used in the OS transistor may have two or more metal oxide layers with different compositions. For example, a stacked structure of a first metal oxide layer having an atomic ratio of In:M:Zn=1:3:4 or a composition similar thereto and a second metal oxide layer having an atomic ratio of In:M:Zn=1:1:1 or a composition similar thereto provided over the first metal oxide layer can be preferably used. 【0046】 Alternatively, for example, a stacked structure of any one selected from indium oxide, indium gallium oxide, and IGZO and any one selected from IAZO, IAGZO, and ITZO may be used. 【0047】 Note that a metal oxide used in an OS transistor preferably has crystallinity. Examples of a crystalline oxide semiconductor include a c-axis-aligned crystalline (CAAC)-OS and a nanocrystalline (nc)-OS. When a crystalline oxide semiconductor is used, a highly reliable semiconductor device can be provided. 【0048】 The memory device 300 has a storage layer 310 that stores data input / output to / from the arithmetic device 100 or the arithmetic device 200 or the like. 【0049】 1A illustrates the memory layer 310 that is stacked on the driving circuit and the like provided in the element layer 20 in the same manner as the element layers 30[1] to 30[4]. The memory layer 310 is a layer that has NOSRAM memory cells. 【0050】NOSRAM (registered trademark) is an abbreviation for "Nonvolatile Oxide Semiconductor Random Access Memory (RAM)." NOSRAM refers to a memory in which memory cells are two-transistor (2T) or three-transistor (3T) gain cells and the transistors are OS transistors. OS transistors have an extremely small leakage current, i.e., a current that flows between the source and drain in the off state. NOSRAM can be used as a nonvolatile memory by retaining a charge corresponding to data in the memory cell using its extremely small leakage current characteristic. In particular, NOSRAM can read stored data without destroying it (nondestructive read), making it suitable for arithmetic processing in which only data read operations are repeated in large quantities. NOSRAM can increase its data capacity by stacking layers, so it can be used as a large-scale cache memory, main memory, or storage memory to improve the performance of semiconductor devices. 【0051】 Note that, in addition to NOSRAM, a DOSRAM having an OS transistor may be used as a configuration applicable to the memory layer 310. DOSRAM (registered trademark) is an abbreviation for "Dynamic Oxide Semiconductor RAM" and refers to a RAM having a 1T (transistor) 1C (capacitor) type memory cell. DOSRAM is a DRAM formed using OS transistors, and is a memory that temporarily stores information sent from the outside. DOSRAM is a memory that utilizes the low off-state current of OS transistors. 【0052】 The peripheral circuit 400 includes an interface circuit with an external circuit, such as a dynamic random access memory (DRAM) interface, a peripheral component interface (PCI), a direct memory access (DMA), a network interface, and an audio interface. 【0053】The semiconductor device 10 has a function as a so-called SoC in which arithmetic units 100 and 200 such as a CPU and a GPU are tightly coupled with a memory unit 300. This configuration shortens the wiring connecting the devices that perform data transfer, thereby suppressing increases in heat generation and power consumption. 【0054】 2A is a circuit diagram showing an example of the configuration of the register 110 shown in FIG. 1A and other figures. The register 110 has a scan flip-flop 120 (volatile register) and a plurality of data retention circuits 130[1] to 130[k] (k is an integer equal to or greater than 2). k can be a number corresponding to the number of layers in the element layer 30. The scan flip-flop 120 has a selector 121 and a flip-flop 122. The register 110 also has a transistor 132. 【0055】 The signals BK[1] to BK[k] are signals that control saving (also referred to as saving, storing, or backing up) of data held in the flip-flop 122 in the scan flip-flop 120. By saving the data, the data held in the flip-flop 122 is held in one of the data holding circuits 130[1] to 130[k]. The signals BK are also referred to as backup signals. 【0056】 The signals RE[1] to RE[k] are signals that control loading (also referred to as restoration, restore, or recovery) of data held in any one of the data holding circuits 130[1] to 130[k]. By loading the data, the data held in any one of the data holding circuits 130[1] to 130[k] is held in the flip-flop 122 in the scan flip-flop 120. The signal RE is also referred to as a restore signal. 【0057】 The signal SE is a switching signal for the selector 121. The clock signal CLK is a signal for operating the flip-flop 122. 【0058】The register 110 holds data input from terminal D or data input from terminal SD of the scan flip-flop 120 in the scan flip-flop 120 and outputs it from terminal Q in response to the clock signal CLK. The data of the scan flip-flop 120 output from terminal Q is saved in one of data retention circuits 130[1] to 130[k]. The data of one of the data retention circuits 130[1] to 130[k] is loaded from terminal SD of the scan flip-flop 120. 【0059】 The data retention circuits 130[1] to 130[k] can independently save or load data, meaning that the scan flip-flops 120 in multiple states that occur as tasks are switched can be stored in the separate data retention circuits 130[1] to 130[k]. 【0060】 The scan flip-flop 120 can be formed using a Si transistor. The scan flip-flop 120 can be provided in the element layer 20. The data retention circuits 130[1] to 130[k] can be formed using an OS transistor and a capacitor. The data retention circuits 130[1] to 130[k] can be provided in each of the element layers 30[1] to 30[k] having an OS transistor. 【0061】 The selector 121 has a function of transmitting the signal of terminal D or terminal SD to the scan flip-flop 120 in response to signal SE. Terminal D is a terminal that supplies data input from outside the register 110. Terminal SD is a terminal that supplies data input from any one of data retention circuits 130[1] to 130[k], or data input from terminal SD_IN that supplies scan test data. The data input from terminal SD_IN is supplied via transistor 132, whose conductive or non-conductive state is controlled by signal BK[0]. 【0062】Although the flip-flop 122 is illustrated as a D flip-flop in FIG. 2A, it is not limited to this. Flip-flops available in a standard circuit library can be applied. The transistors included in the flip-flop 122 are Si transistors, and by including a circuit such as an inverter loop, it is possible to hold one piece of data. The flip-flop 122 outputs a signal at its input terminal D in response to a clock signal CLK. F The data is stored and output to the output terminal Q F The signal is output to terminal Q. 【0063】 As described above, the data retention circuits 130[1] to 130[k] are provided in each of the element layers 30[1] to 30[k] on the element layer 20 on which the scan flip-flop 120 is provided. With this configuration, multiple data retention circuits 130 can be provided in the region where the scan flip-flop 120 is formed, so even if multiple data retention circuits 130 are incorporated into the register 110, the area overhead of the register 110 can preferably be made zero. 【0064】 In addition, the data retention circuits 130[1] to 130[k] have an area overlapping with the scan flip-flop 120, which can shorten the distance between the scan flip-flop 120 and the data retention circuits 130[1] to 130[k] electrically connected to the scan flip-flop 120. Therefore, a configuration can be achieved in which power consumption required for charging and discharging between wirings is suppressed. 【0065】 The data retention circuits 130[1] to 130[k] each include a transistor 133, a transistor 134, and a capacitor 135. The other electrode of the capacitor 135 is connected to a wiring CL. The transistor 133 is provided between the capacitor 135 and a terminal Q. The transistor 134 is provided between the capacitor 135 and a terminal SD. In each of the multiple data retention circuits 130[1] to 130[k], one electrode of the capacitor 135 is illustrated as a node SN[1] to a node SN[k]. 【0066】The transistors 133 and 134 are OS transistors. The transistors 133 and 134 are illustrated as having back gates. The transistor characteristics can be controlled by supplying a constant voltage to the back gates of the transistors 133 and 134. The OS transistors have an extremely low off-state current, which can suppress a decrease in the voltage of the nodes SN[1] to SN[k]. Furthermore, the data retention circuits 130[1] to 130[k] each have nonvolatile characteristics. Data is rewritten by charging and discharging the capacitor 135. Therefore, the data retention circuits 130[1] to 130[k] can write and read data with low energy without any restriction on the number of rewrite operations, in principle. 【0067】 Since all the transistors in the data retention circuits 130[1] to 130[k] are OS transistors, the data retention circuit 130 can be stacked on the scan flip-flop 120 configured with a silicon CMOS circuit as shown in FIG. 2B . Note that in FIG. 2B , the transistor 132 is illustrated as being provided in the same layer as the transistors 133 and 134. The transistor 132 is not limited to an OS transistor. An OS transistor or a Si transistor can be used as the transistor 132. 【0068】Because the data retention circuits 130[1] to 130[k] have a significantly smaller number of elements than the scan flip-flops 120, stacking the data retention circuits 130[1] to 130[k] does not require changes to the circuit configuration and layout of the scan flip-flops 120. In other words, the data retention circuits 130[1] to 130[k] are highly versatile circuits. Furthermore, because the data retention circuits 130[1] to 130[k] can be provided in the region where the scan flip-flops 120 are formed, even if multiple data retention circuits 130[1] to 130[k] are incorporated, the area overhead can be reduced to zero. Because the data retention circuits 130[1] to 130[k] require less energy to retain data, data can be frequently saved or loaded in the arithmetic device 100. 【0069】 Note that by providing the data retention circuits 130[1] to 130[k], a parasitic capacitance due to the transistor 133 is added to the node Q. However, since this is smaller than the parasitic capacitance due to the logic circuit connected to the node Q, it does not affect the operation of the scan flip-flop 120. In other words, even if a plurality of data retention circuits 130[1] to 130[k] are provided, the performance of the register 110 does not substantially decrease. 【0070】 In the data retention circuits 130[1] to 130[k], the OS transistor functions as a switch. In an OS transistor, which is an n-channel transistor, a signal applied to the gate is set to a high level (hereinafter, represented as "="H") to bring the source and drain into a conductive state (ON), and a signal applied to the gate is set to a low level (hereinafter, represented as "="L") to bring the source and drain into a non-conductive state (OFF). In the selector 121, a signal SE is set to a high level (hereinafter, represented as "="H") to select the signal at terminal SD, and a signal SE is set to a low level (hereinafter, represented as "="L") to select the signal at terminal D. 【0071】For example, by setting the signal BK[1]="H" in the data retention circuits 130[1] to 130[k], data retained by the flip-flop 122 can be written to the node SN[1] of the data retention circuit 130[1]. Similarly, by setting BK[2]="H", BK[3]="H", and BK[4]="H", data retained by the flip-flop 122 can be written to the nodes SN[2], SN[3], and SN[4] of the data retention circuits 130[2] to 130[4], respectively. Furthermore, by setting RE[1]="H" and SE="H", data retained at the node SN[1] of the data retention circuit 130[1] can be written back to the flip-flop 122. Similarly, by setting RE[2] = "H", RE[3] = "H", and RE[4] = "H", the data at nodes SN[2], SN[3], and SN[4] of data retention circuits 130[2] to 130[4] can be written back to flip-flop 122. 【0072】 3A illustrates a configuration in which k=4, where there are four data retention circuits 130, to explain the operation of the register 110 described in FIG. 2A. In FIG. 3A, nodes SN[1] to SN[4] that retain data are illustrated in the data retention circuits 130 (data retention circuits 130[1] to 130[4]) included in the data retention circuit 130. In FIG. 3A, signals BK[1] to BK[4] and signals RE[1] to RE[4] that control the data retention circuits 130[1] to 130[4] are also illustrated. 【0073】 3B shows an example of a timing chart illustrating the operation of the register 110 shown in FIG. 3A. In FIG. 3B, T0 to T7 represent time. FIG. 3B illustrates the clock signal CLK, terminal D, terminal Q, signal BK[1], signal BK[2], signal RE[1], signal RE[2], node SN[1], node SN[2], and signal SE supplied to the selector 121. The flip-flop 122 outputs a signal SE to the input terminal D in synchronization with the rising edge of the clock signal CLK (a waveform that switches from L level to H level). F The data is stored and output terminal Q F Output from 【0074】4A to 4E are schematic diagrams of the register 110 for explaining the operation in the timing chart of FIG. 3B. FIG. 4A illustrates the scan flip-flop 120 and the data retention circuits 130[1] to 130[4]. FIG. 4B, 4C, 4D, and 4E show data input to and output from the scan flip-flop 120 and the data retention circuits 130[1] to 130[4] at times T1, T3, T5, and T7 in FIG. 3B. 【0075】 At time T0, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores data D0 and outputs a signal to the output terminal Q F The terminal D is supplied with data D1. 【0076】 At time T1, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D1 applied to the terminal D, and outputs the data D1 to the output terminal Q. F At time T1, signal BK[1]="H", signal RE[1]="L", and signal SE="L", so that data D1 from scan flip-flop 120 is held in data hold circuit 130[1] (see FIG. 4B). Data D2 is provided to terminal D. 【0077】 At time T2, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D2 applied to the terminal D, and outputs the data D2 to the output terminal Q. F The terminal D is supplied with data D3. 【0078】 At time T3, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D3 applied to the terminal D, and outputs the data D3 to the output terminal Q. F At time T3, signal BK[2]="H", signal RE[2]="L", and signal SE="L", so that data D3 of scan flip-flop 120 is held in data hold circuit 130[2] (see FIG. 4C). Data D4 is provided to terminal D. 【0079】At time T4, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D4 applied to the terminal D, and outputs the data D4 to the output terminal Q. F The terminal D is supplied with data D5. 【0080】 At time T5, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D5 applied to the terminal D, and outputs the data D5 to the output terminal Q. F At time T5, by setting BK[1]="L", RE[1]="H", and SE="H", the data D1 held in the data holding circuit 130[1] can be written back to the scan flip-flop 120 (see FIG. 4D). Data D6 is provided to the terminal D. 【0081】 At time T6, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D6 applied to the terminal D, and outputs the data D6 to the output terminal Q. F The terminal D is supplied with data D7. 【0082】 At time T7, in synchronization with the rising edge of the clock signal CLK, the scan flip-flop 120 stores the data D7 applied to the terminal D, and outputs the data D7 to the output terminal Q. F At time T7, by setting BK[2]="L", RE[2]="H", and SE="H", the data D3 held in the data hold circuit 130[2] can be written back to the scan flip-flop 120 (see FIG. 4E). Data D8 is provided to the terminal D. 【0083】 As described with reference to Figures 3B and 4B to 4E, a configuration is possible in which data of an interrupted task is saved and data of a task to be resumed is loaded. In one embodiment of the present invention, data saved in response to task switching can be stored in multiple data retention circuits. This configuration allows program processing to be executed sequentially by saving and loading data in response to switching between multiple tasks at the timing when an interrupt signal is input. This allows data processing to be performed more efficiently. 【0084】 FIG. 5 is a timing chart of the task switching operation using the register 110 shown in FIG. 3A and the operation of the register 110 described in FIG. 3B. 【0085】 At time Ta, while the arithmetic device 100 is executing task 1, it stores the data of the scan flip-flop 120 in the data retention circuit 130[1] (Save to 130[1]), and then writes the data of the data retention circuit 130[2] back to the scan flip-flop 120 (Load from 130[2]). In this way, the state of task 1 is saved, and task 2 is switched to as an executable state. 【0086】 At time Tb, while the arithmetic device 100 is executing task 2, the data in the scan flip-flop 120 is stored in the data retention circuit 130[2] (Save to 130[2]), and then the data in the data retention circuit 130[3] is written back to the scan flip-flop 120 (Load from 130[3]). In this way, the state of task 2 is saved, and task 3 is placed in an executable state, and task 3 is switched to. 【0087】 At time Tc, while the arithmetic device 100 is executing task 3, the data in the scan flip-flop 120 is stored in the data retention circuit 130[3] (Save to 130[3]), and then the data in the data retention circuit 130[1] is written back to the scan flip-flop 120 (Load from 130[1]). Here, the data written back to the scan flip-flop 120 from the data retention circuit 130[1] is the data stored in the data retention circuit 130[1] from the scan flip-flop 120 at time Ta. In other words, task 1, which was being executed up to time Ta, can be continued. In this way, the state of task 3 is saved, and task 1 is switched to task 3 as an executable state. 【0088】The above configuration makes it possible to provide a semiconductor device including an arithmetic unit that can reduce power consumption while providing a large number of registers. Furthermore, since processing can be resumed from where it was left off when the previous task was executed when switching tasks, it is possible to provide a semiconductor device including an arithmetic unit with improved arithmetic performance. 【0089】 The arithmetic unit having a register included in the semiconductor device of this embodiment can resume processing of the original task based on the interrupted data even if another task interrupts during program processing by a task, and then another task interrupts.The data for resuming the task being processed is held in a register within the arithmetic unit, so there is no need to access an external memory, such as a stack area in an SRAM or DRAM, to save or load data.As a result, even if processing is performed to switch between different tasks due to a task interrupt, the processing of data saved or loaded upon switching can be performed efficiently without causing lag in memory access, etc. 【0090】 6A and 6B are schematic diagrams illustrating exemplary configurations of the memory circuit 210 and the layer selection circuits 220 and 230 included in the arithmetic device 200 according to one embodiment of the present invention. Also, FIGS. 7A and 7B are diagrams illustrating exemplary configurations of memory cells included in the memory circuit 210. Also, FIGS. 8A to 8C are diagrams illustrating exemplary circuit configurations and operation of the layer selection circuits 220 and 230. Note that in the following description, for ease of understanding, the element layers 30[1] to 30[k] are described as four layers, i.e., k=4. 【0091】 As shown in Figure 6A, a plurality of blocks are illustrated as the memory circuit 210. Note that in Figure 6A, as an example, four stacked blocks (blocks in which memory circuits 210[1] to 210[4] are stacked) correspond to the memory circuit 210. Note that Figure 6A shows a state in which the four stacked blocks are arranged side by side in the X direction. 【0092】The memory circuits 210[1] to 210[4] of each element layer include a plurality of memory cells MC provided in the element layers 30[1] to 30[4], respectively (see FIG. 6B). 【0093】 A memory cell having an OS transistor can be used as the memory cell MC. For example, the circuit configuration example of a NOSRAM shown in FIG. 7A can be used. The memory cell MC shown in FIG. 7A is an example of a NOSRAM having transistors M1 to M3 and a capacitor C. 【0094】 7A illustrates wiring WWL, wiring RWL, wiring WBL, wiring RBL, and wiring PL connected to elements of the memory cell MC. The wiring WWL functions as a write word line. The wiring RWL functions as a read word line. The wiring WBL functions as a write bit line. The wiring RBL functions as a read bit line. The wiring PL functions as a capacitance line. The wiring PL can function as a wiring that transmits a potential to be applied to the back gate of the transistor M1. 【0095】 7A, the memory cells MC are arranged in the Y direction in stacked element layers 30[1] to 30[4], and are electrically connected to the same wirings WBL and RBL. Also, FIG. 7B shows a schematic diagram of stacked memory cells MC, which are NOSRAMs having OS transistors. By stacking the memory cells MC side by side, the memory circuit 210 can be formed by stacking the memory circuits 210[1] to 210[4] shown in FIG. 6A. 【0096】 6B and 7B, the memory cells MC included in each of the memory circuits 210[1] to 210[4] are provided in the same layer as the layer selection circuits 220 and 230 provided in each of the element layers 30[1] to 30[4]. In Figures 6A, 6B, and 7B, the layer selection circuits 220 and 230 provided in the element layers 30[1] to 30[4] are illustrated as layer selection circuits 220[1] to 220[4] and 230[1] to 230[4]. 【0097】 As shown in Fig. 6A, the arithmetic device 200 includes a write word line driver unit 221, a read word line driver unit 231, and an arithmetic circuit 211. Fig. 6B illustrates how the write word line driver unit 221, the read word line driver unit 231, and the arithmetic circuit 211 are provided in the element layer 20. Fig. 6B also illustrates how layer selection circuits 220[1] to 220[4] and 230[1] to 230[4] are provided in the element layers 30[1] to 30[4]. 【0098】 The layer selection circuits 220[1] to 220[4] control the signals output to the wirings WWLout[1] to WWLout[4] by controlling the signal output to the wiring WWLin by the write word line driver unit 221. The wirings WWLout[1] to WWLout[4] correspond to the wirings WWL connected to the memory cells MC provided in the element layers 30[1] to 30[4]. The signals output to the wirings WWLout[1] to WWLout[4] are signals that control the writing of data signals from the wirings WBL provided extending in the Z direction to the memory cells MC. The layer selection circuits 220[1] to 220[4] can be arranged to overlap in the Z direction, as shown in Figures 6A and 6B. 【0099】 The layer selection circuits 230[1] to 230[4] control the output of signals output to the wirings RWLout[1] to RWLout[4] by controlling the signal output to the wiring RWLin by the read word line driver unit 231. The wirings RWLout[1] to RWLout[4] correspond to the wirings RWL connected to the memory cells MC provided in the element layers 30[1] to 30[4]. The signals output to the wirings RWLout[1] to RWLout[4] are signals that control the reading of data signals from the wirings RBL provided extending in the Z direction to the memory cells MC. The layer selection circuits 230[1] to 230[4] can be arranged to overlap in the Z direction, as shown in Figures 6A, 6B, and 7B. 【0100】8A is a circuit diagram illustrating an example of a circuit configuration applicable to the layer selection circuits 220 and 230. The layer selection circuits 220 and 230 include a transistor ML1, a transistor ML2, and a transistor ML3. The transistors ML1 to ML3 are OS transistors provided in stacked element layers 30[1] to 30[4], similar to the transistors included in the memory cell MC. 【0101】 The gate of the transistor ML2 is electrically connected to one of the source and drain of the transistor ML1. The source and drain of the transistor ML2 is electrically connected to one of the source and drain of the transistor ML3 and to a wiring WWLout or RWLout (WWLout / RWLout in the drawing), which corresponds to the wiring WWL or RWL provided in the element layers 30[1] to 30[4]. The other of the source and drain of the transistor ML2 is electrically connected to a wiring WWLin or RWLin (WWLin or RWLin in the drawing) connected to the write word line driver unit 221 or the read word line driver unit 231. The other of the source and drain of the transistor ML1 is electrically connected to a wiring to which a potential VLD (high power supply potential) is applied. The gate of the transistor ML1 is electrically connected to a wiring to which a signal LSEL is applied. The gate of the transistor ML3 is electrically connected to a wiring to which a signal LSELB is applied. The other of the source and the drain of the transistor ML3 is electrically connected to a wiring to which a potential VLS (low power supply potential) is applied. Note that a region where the gate of the transistor ML2 and the one of the source and the drain of the transistor ML1 are electrically connected to each other may be referred to as a node FN1. 【0102】8C shows an example configuration of a plurality of memory cells MC connected to the layer selection circuits 220, 230 via wirings WWL and wirings RWL. The plurality of memory cells MC shown in Fig. 8C are selected collectively by signals output from the layer selection circuits 220, 230. Therefore, by controlling the signals output from the layer selection circuits 220, 230, it is possible to configure a configuration in which data can be written to and read from the memory circuits 210 provided for each element layer 30 collectively. 【0103】 The configuration of the layer selection circuits 220 and 230 is not limited to the example configuration shown in Fig. 8A. For example, a capacitor may be provided between the gate of the transistor ML2 and either the source or the drain of the transistor ML1. 【0104】 The layer selection circuits 220 and 230 have a function of outputting either a signal or a potential VLS applied to the wiring WWLin or RWLin to the wiring WWLout or RWLout in response to the signals LSEL and LSELB. 【0105】 FIG. 8B is a timing chart illustrating an example of the operation of the layer selection circuits 220 and 230. 【0106】 8B shows the potentials (H level or L level) of the signal LSEL, the signal LSELB, and the signal applied to the wiring WWLin or RWLin at each time point of operation, and also shows changes in the potentials of the node FN1 and the wiring WWLout or RWLout. 【0107】 In the following description of the operation example, it is assumed that the potential VLD is the same as the H level of the signals LSEL and LSELB, and that the potential VLS is the same as the L level of the signals LSEL and LSELB. 【0108】Just before time TL1, the signal LSEL is set to the L level and the signal LSELB is set to the H level. At this time, the transistor ML1 is on, so the potential of the node FN1 is set to the L level. Therefore, the transistor ML2 is off, and the transistor ML3 is on. Therefore, whether the signal applied to the wiring WWLin or the wiring RWLin is on or off, the potential of the wiring WWLout or the wiring RWLout is set to the L level (the potential VLS). 【0109】 At time TL1, the signal LSEL goes high and the signal LSELB goes low. At this time, the potential of the node FN1 rises from the high level (potential VLD) to a potential obtained by subtracting the threshold voltage of the transistor ML1, and the transistor ML1 goes off. This causes the transistor ML2 to go on and the transistor ML3 to go off. Therefore, the potential of the wiring WWLout or the wiring RWLout goes low (the signal applied to the wiring WWLin or the wiring RWLin at time TL1). 【0110】 At time TL2, the signal applied to the wiring WWLin or RWLin becomes H level. Then, current flows from the wiring WWLin or RWLin to the wiring WWLout or RWLout via the transistor ML2, causing the potential of the wiring WWLout or RWLout to increase. At this time, because the transistor ML1 is off, the potential of the node FN1 also increases due to capacitive coupling at the gate capacitance of the transistor ML2. Therefore, the potential difference between the gate and source of the transistor ML2 is maintained, that is, the transistor ML2 is maintained in a conductive state. Therefore, the potential of the wiring WWLout or RWLout becomes H level (the signal applied to the wiring WWLin or RWLin at time TL2). 【0111】In this way, the layer selection circuits 220 and 230 configure a bootstrap circuit in which a gate capacitance is provided between the gate and source of the transistor ML2, so that when a signal applied to the wiring WWLin or the wiring RWLin becomes high, the transistor ML2 remains conductive, and therefore a high level signal can be output to the wiring WWLout or the wiring RWLout. The gate capacitance of the transistor ML2 is sometimes called a "bootstrap capacitance." 【0112】 The arithmetic device 200 can select one of the memory circuits 210[1] to 210[4] by controlling the signals LSEL and LSELB provided to the layer selection circuits 220[1] to 220[4] or the layer selection circuits 230[1] to 230[4], and output the signal provided to the wiring WWLin or wiring RWLin to the wiring WWLout or wiring RWLout. 【0113】 For example, by setting the signals LSEL and LSELB given to the layer selection circuit 220[1] to H level and L level, respectively, and setting the signals LSEL and LSELB given to the layer selection circuits 220[2] to 220[4] to L level and H level, respectively, the signal given from the write word line driver unit 221 to the wiring WWLin is output to the wiring WWLout[1] via the layer selection circuit 220[1]. 【0114】 In the arithmetic device 200, wiring that functions as a word line must be provided in each of the element layers 20 to 30[4]. However, by providing a layer selection circuit in each element layer, the number of wirings can be reduced. Furthermore, the arithmetic device 200 can suppress an increase in the area of ​​the write word line driver unit 221 and the read word line driver unit 231 that accompanies an increase in the number of layers of the element layers 30[1] to 30[4]. In other words, the arithmetic device 200 can increase the number of layers of the element layers 30[1] to 30[4] in which memory circuits are provided without increasing the area overhead, thereby improving the density of memory cells MC (memory density). 【0115】Next, a configuration example of the arithmetic circuit 211 will be described. The arithmetic circuit 211 has a function of performing a product-sum operation. The arithmetic device 200 including the arithmetic circuit 211 may be called an accelerator or a graphics processing unit (GPU). Memory cells MC such as NOSRAM or DOSRAM can be stacked on the arithmetic circuit 211. That is, a layer having an OS transistor can be stacked in the vertical direction on a substrate on which an element layer 20 including a Si transistor is provided. 【0116】 The arithmetic circuit 211 can perform, for example, parallel processing of matrix operations in graphic processing, parallel processing of product-sum operations in neural networks, and parallel processing of floating-point operations in scientific and technological calculations. 【0117】 For example, memory cells having OS transistors such as NOSRAM can be applied to the memory cells MC[1] to MC[4] shown in Figure 9. The circuit configuration of the memory cells MC[1] to MC[4] shown in Figure 9 corresponds to a NOSRAM of a three-transistor (3T) gain cell. NOSRAM can be used as a nonvolatile memory by retaining charge corresponding to data in the memory cell using its extremely low leakage current characteristics. 【0118】 For example, the arithmetic circuit 211 shown in FIG. 9 includes a read circuit 241 to which a signal from a line RBL is applied, a bit product-sum calculator 242, an accumulator 243, a latch circuit 244, and an encoding circuit 245 that outputs an output signal Q. 【0119】Each circuit constituting the arithmetic circuit 211 includes a Si transistor and can be provided in the element layer 20. The memory cell MC includes an OS transistor and can be provided in the element layers 30[1] to 30[4]. Therefore, as shown in FIGS. 7A and 7B , in a configuration in which the element layer 20 and the element layers 30[1] to 30[4] are stacked, the regions in which the circuits are provided can be arranged to overlap. The wiring RBL connecting the arithmetic circuit 211 and the memory cell MC is provided in a direction perpendicular to the surface of the substrate on which the element layer 20 is provided (the z direction). The wiring RWL can be provided in an opening provided in an insulating layer and can be microfabricated. Therefore, the wiring RWL can have a smaller parasitic capacitance than wiring using a silicon through electrode or the like. As a result, the power required for charging and discharging the wiring can be reduced, thereby achieving power savings. 【0120】 9, it is possible to reduce the circuit area by using a circuit configuration specialized for product-sum operations. Therefore, it is possible to reduce power consumption by reducing the circuit area. 【0121】 10A to 10C are schematic diagrams illustrating a configuration in which different data is stored in the memory circuits 210 provided in each of the multiple element layers 30[1] to 30[4], and data is read or written by switching the layer selection circuit. 【0122】 In the memory circuit 210, the data stored for each element layer 30 is weight data used in a product-sum operation. FIG. 10A illustrates weight data NN1 stored in the memory circuit 210[1] of the first element layer 30[1]. FIG. 10A illustrates weight data NN2 stored in the memory circuit 210[2] of the second element layer 30[2]. FIG. 10A illustrates weight data NN3 stored in the memory circuit 210[3] of the third element layer 30[3]. FIG. 10A illustrates weight data NN4 stored in the memory circuit 210[4] of the fourth element layer 30[4]. 【0123】The weight data sets stored in the memory circuits 210[1] to 210[4] are written from the arithmetic circuit 211 to the memory cells MC of the memory circuit 210 by switching control performed by the layer selection circuit 220. In addition, the weight data are read from the memory cells MC of the memory circuit 210 to the arithmetic circuit 211 by switching control performed by the layer selection circuit 230. 【0124】 10B, the weight data NN2 in the memory circuit 210[2] can be updated by controlling the layer selection circuit 220 to output a signal to the wiring WWLout[2]. For example, in FIG. 10B, the weight data NN1 in the memory circuit 210[1] can be read out to the arithmetic circuit 211 by controlling the layer selection circuit 230 to output a signal to the wiring RWLout[1]. 【0125】 10C, the weight data NN1 in the memory circuit 210[1] can be updated by controlling the layer selection circuit 220 to output a signal to the wiring WWLout[1]. For example, in FIG. 10C, the weight data NN4 in the memory circuit 210[4] can be read out to the arithmetic circuit 211 by controlling the layer selection circuit 230 to output a signal to the wiring RWLout[4]. 【0126】 10B and 10C, weight data can be written to and read from different memory circuits 210 under the control of the layer selection circuits 220 and 230. In other words, with this configuration, in the arithmetic processing that mimics a neural network, the sequence for switching weight data can be performed by switching the layer selection circuits 220 and 230. 【0127】 FIG. 11 is a timing chart for explaining the simultaneous execution of task switching in the arithmetic device 100 described above in FIG. 6 and weight data switching in the arithmetic processing that mimics a neural network in the arithmetic device 200. 【0128】At time Ta, while the arithmetic device 100 is executing task 1, it stores the data of the scan flip-flop 120 in the data retention circuit 130[1] (Save to 130[1]), and then writes the data of the data retention circuit 130[2] back to the scan flip-flop 120 (Load from 130[2]). In this way, the state of task 1 is saved, and task 2 is made executable and switched to task 2. At the same time, the arithmetic device 200 reads the weight data NN2 from the memory cell of the memory circuit 210[2], and switches from arithmetic processing based on the first neural network to arithmetic processing based on the second neural network. 【0129】 At time Tb, while the arithmetic device 100 is executing task 2, it stores the data of the scan flip-flop 120 in the data retention circuit 130[2] (Save to 130[2]), and then writes the data of the data retention circuit 130[3] back to the scan flip-flop 120 (Load from 130[3]). In this way, the state of task 2 is saved, and task 3 (task3) is made executable and switched to task 3. At the same time, the arithmetic device 200 reads weight data NN3 from the memory cell of the memory circuit 210[3], and switches from arithmetic processing based on the second neural network to arithmetic processing based on the third neural network. 【0130】At time Tc, while the arithmetic device 100 is executing task 3, the data in the scan flip-flop 120 is stored in the data retention circuit 130[3] (Save to 130[3]), and then the data in the data retention circuit 130[1] is written back to the scan flip-flop 120 (Load from 130[1]). Here, the data written back to the scan flip-flop 120 from the data retention circuit 130[1] is the data stored in the data retention circuit 130[1] from the scan flip-flop 120 at time Ta. In other words, task 1, which was being executed until time Ta, can be continued. In this way, the state of task 3 is saved, and task 1 is switched to task 3 as an executable state. At the same time, the arithmetic device 200 reads the weight data NN1 from the memory cell of the memory circuit 210[1] and switches from the arithmetic processing based on the third neural network to the arithmetic processing based on the first neural network. 【0131】 For example, a first neural network can be configured to perform digit recognition and perform number authentication as a first task, a second neural network can be configured to perform animal recognition and perform pet location as a second task, and a third neural network can be configured to perform vehicle recognition and perform visitor presence detection as a third task. 【0132】 The above-described configuration makes it possible to provide a semiconductor device that can reduce power consumption while providing a large number of registers. Furthermore, since processing can be resumed from where it left off when switching tasks, it is possible to provide a semiconductor device with improved computing performance. Furthermore, it is possible to provide a semiconductor device that supports multiple neural networks and has improved computing performance. 【0133】 This embodiment mode can be implemented in appropriate combination with other embodiment modes described in this specification. 【0134】(Embodiment 2) In this embodiment, a structure of a transistor applicable to the semiconductor device described in the above embodiment will be described. As an example, a structure in which transistors having different electrical characteristics are stacked will be described. By using this structure, the degree of freedom in designing a semiconductor device can be increased. In addition, by stacking transistors having different electrical characteristics, the degree of integration of a semiconductor device can be increased. 【0135】 12 shows a part of a cross-sectional structure of a semiconductor device. The semiconductor device shown in FIG. 12 includes a transistor 550, a transistor 500, and a capacitor 600. FIG. 13A is a cross-sectional view of the transistor 500 in the channel length direction, FIG. 13B is a cross-sectional view of the transistor 500 in the channel width direction, and FIG. 13C is a cross-sectional view of the transistor 550 in the channel width direction. For example, the transistor 500 corresponds to the Si transistor described in the above embodiment, and the transistor 550 corresponds to an OS transistor. 【0136】 In FIG. 12, the transistor 500 is provided above the transistor 550 , and the capacitor 600 is provided above the transistor 550 and the transistor 500 . 【0137】 The transistor 550 is provided over a substrate 311 and includes a conductor 316, an insulator 315, a semiconductor region 313 made of part of the substrate 311, and low-resistance regions 314a and 314b functioning as source and drain regions. 【0138】 13C , in the transistor 550, the top surface and the side surfaces in the channel width direction of the semiconductor region 313 are covered with a conductor 316 via an insulator 315. By forming the transistor 550 as a Fin type in this manner, the effective channel width is increased, thereby improving the on-state characteristics of the transistor 550. Furthermore, the contribution of the electric field of the gate electrode can be increased, thereby improving the off-state characteristics of the transistor 550. 【0139】 The transistor 550 may be either a p-channel type or an n-channel type. 【0140】The region where the channel of the semiconductor region 313 is formed, the region nearby, the low-resistance region 314a that serves as the source region or drain region, and the low-resistance region 314b preferably contain a semiconductor such as a silicon-based semiconductor, and preferably contain single-crystal silicon. Alternatively, they may be formed of a material containing Ge (germanium), SiGe (silicon germanium), GaAs (gallium arsenide), GaAlAs (gallium aluminum arsenide), or the like. A configuration using silicon in which the effective mass is controlled by applying stress to the crystal lattice and changing the lattice spacing may also be used. Alternatively, the transistor 550 may be a high electron mobility transistor (HEMT) by using GaAs and GaAlAs, or the like. 【0141】 The low resistance region 314a and the low resistance region 314b contain, in addition to the semiconductor material applied to the semiconductor region 313, an element that imparts n-type conductivity, such as arsenic or phosphorus, or an element that imparts p-type conductivity, such as boron. 【0142】 The conductor 316 functioning as the gate electrode can be made of a conductive material such as a semiconductor material such as silicon containing an element that imparts n-type conductivity such as arsenic or phosphorus, or an element that imparts p-type conductivity such as boron, a metal material, an alloy material, or a metal oxide material. 【0143】 Since the work function is determined by the material of the conductor, the threshold voltage of the transistor can be adjusted by selecting the material of the conductor. Specifically, it is preferable to use a material such as titanium nitride or tantalum nitride as the conductor. Furthermore, in order to achieve both conductivity and embeddability, it is preferable to use a metal material such as tungsten or aluminum as the conductor in a stacked structure, and tungsten is particularly preferable in terms of heat resistance. 【0144】 The transistor 550 may be formed using an SOI (Silicon on Insulator) substrate or the like. 【0145】The SOI substrate may be a SIMOX (Separation by Implanted Oxygen) substrate formed by implanting oxygen ions into a mirror-polished wafer and then heating it at a high temperature to form an oxide layer to a certain depth from the surface and eliminate defects that have occurred in the surface layer, or an SOI substrate formed using a Smart Cut method or an ELTRAN method (registered trademark: Epitaxial Layer Transfer) that cleaves a semiconductor substrate by utilizing growth by heat treatment of microvoids formed by hydrogen ion implantation. A transistor formed using a single crystal substrate has a single crystal semiconductor in a channel formation region. 【0146】 An insulator 320 , an insulator 322 , an insulator 324 , and an insulator 326 are stacked in this order to cover the transistor 550 . 【0147】 The insulators 320, 322, 324, and 326 can be made of, for example, silicon oxide, silicon oxynitride, silicon nitride oxide, silicon nitride, aluminum oxide, aluminum oxynitride, aluminum nitride oxide, aluminum nitride, or the like. 【0148】 In this specification, silicon oxynitride refers to a material whose composition contains more oxygen than nitrogen, silicon nitride oxide refers to a material whose composition contains more nitrogen than oxygen, aluminum oxynitride refers to a material whose composition contains more oxygen than nitrogen, and aluminum nitride oxide refers to a material whose composition contains more nitrogen than oxygen. 【0149】 The insulator 322 may function as a planarizing film that planarizes steps caused by the transistor 550 or the like provided thereunder. For example, the top surface of the insulator 322 may be planarized by planarization treatment using a chemical mechanical polishing (CMP) method or the like to improve the planarity. 【0150】The insulator 324 is preferably a film having a barrier property that prevents hydrogen, impurities, and the like from diffusing from the substrate 311 or the transistor 550 to a region where the transistor 500 is provided. 【0151】 As an example of a film having a barrier property against hydrogen, for example, silicon nitride formed by a CVD method can be used. Here, diffusion of hydrogen into a semiconductor element including an oxide semiconductor, such as the transistor 500, may degrade the characteristics of the semiconductor element. Therefore, it is preferable to use a film that suppresses hydrogen diffusion between the transistor 500 and the transistor 550. Specifically, the film that suppresses hydrogen diffusion is a film that releases a small amount of hydrogen. 【0152】 The amount of desorption of hydrogen can be analyzed using, for example, thermal desorption spectroscopy (TDS). For example, in the TDS analysis, the amount of desorption of hydrogen from the insulator 324 is calculated as 1×10 per area of ​​the insulator 324 when the surface temperature of the film is in the range of 50° C. to 500° C. and the amount of desorption converted into hydrogen atoms is 1×10 16 atoms / cm 2 Below 5 × 10, preferably 15 atoms / cm 2 The following is fine. 【0153】 The insulator 326 preferably has a lower dielectric constant than the insulator 324. For example, the relative dielectric constant of the insulator 326 is preferably less than 4, and more preferably less than 3. Furthermore, for example, the relative dielectric constant of the insulator 326 is preferably 0.7 times or less, and more preferably 0.6 times or less, the relative dielectric constant of the insulator 324. By using a material with a low dielectric constant as the interlayer film, the parasitic capacitance that occurs between wirings can be reduced. 【0154】Furthermore, insulators 320, 322, 324, and 326 are embedded with conductors 328 and 330, which are connected to capacitor 600 or transistor 500. Note that conductors 328 and 330 function as plugs or wiring. Furthermore, for conductors that function as plugs or wiring, the same reference numeral may be used to denote multiple components. Furthermore, in this specification and the like, the wiring and the plug connecting to the wiring may be integrated. That is, there are cases where a portion of a conductor functions as a wiring, and cases where a portion of a conductor functions as a plug. 【0155】 As the material for each plug and wiring (conductor 328, conductor 330, etc.), a conductive material such as a metal material, an alloy material, a metal nitride material, or a metal oxide material can be used in a single layer or a laminated layer. It is preferable to use a high-melting-point material such as tungsten or molybdenum that has both heat resistance and conductivity, and tungsten is preferred. Alternatively, it is preferable to form the plug and wiring from a low-resistance conductive material such as aluminum or copper. Using a low-resistance conductive material can reduce the wiring resistance. 【0156】 A wiring layer may be provided over the insulator 326 and the conductor 330. For example, in FIG. 12 , an insulator 350, an insulator 352, and an insulator 354 are stacked in this order. A conductor 356 is formed in the insulators 350, 352, and 354. The conductor 356 functions as a plug or wiring connected to the transistor 550. Note that the conductor 356 can be formed using a material similar to that of the conductors 328 and 330. 【0157】Note that, for example, the insulator 350 preferably uses an insulator having a barrier property against hydrogen, similar to the insulator 324. The conductor 356 preferably includes a conductor having a barrier property against hydrogen. In particular, a conductor having a barrier property against hydrogen is formed in an opening of the insulator 350 having a barrier property against hydrogen. With this structure, the transistor 550 and the transistor 500 can be separated by a barrier layer, and diffusion of hydrogen from the transistor 550 to the transistor 500 can be suppressed. 【0158】 Note that, for example, tantalum nitride or the like is preferably used as a conductor having a barrier property against hydrogen. Furthermore, by stacking tantalum nitride and highly conductive tungsten, diffusion of hydrogen from the transistor 550 can be suppressed while maintaining the conductivity of the wiring. In this case, it is preferable that the tantalum nitride layer having a barrier property against hydrogen be in contact with the insulator 350 having a barrier property against hydrogen. 【0159】 A wiring layer may be provided over the insulator 354 and the conductor 356. For example, in FIG. 12 , an insulator 360, an insulator 362, and an insulator 364 are stacked in this order. A conductor 366 is formed in the insulator 360, the insulator 362, and the insulator 364. The conductor 366 functions as a plug or a wiring. The conductor 366 can be provided using the same material as the conductors 328 and 330. 【0160】 Note that, for example, the insulator 360 preferably uses an insulator having a barrier property against hydrogen, similar to the insulator 324. The conductor 366 preferably includes a conductor having a barrier property against hydrogen. In particular, a conductor having a barrier property against hydrogen is formed in an opening of the insulator 360 having a barrier property against hydrogen. With this structure, the transistor 550 and the transistor 500 can be separated by a barrier layer, and diffusion of hydrogen from the transistor 550 to the transistor 500 can be suppressed. 【0161】A wiring layer may be provided over the insulator 364 and the conductor 366. For example, in FIG. 12 , an insulator 370, an insulator 372, and an insulator 374 are stacked in this order. A conductor 376 is formed in the insulators 370, 372, and 374. The conductor 376 functions as a plug or wiring. The conductor 376 can be provided using the same material as the conductors 328 and 330. 【0162】 Note that, for example, the insulator 370 preferably uses an insulator having a barrier property against hydrogen, similar to the insulator 324. The conductor 376 preferably includes a conductor having a barrier property against hydrogen. In particular, a conductor having a barrier property against hydrogen is formed in an opening of the insulator 370 having a barrier property against hydrogen. With this structure, the transistor 550 and the transistor 500 can be separated by a barrier layer, and diffusion of hydrogen from the transistor 550 to the transistor 500 can be suppressed. 【0163】 A wiring layer may be provided over the insulator 374 and the conductor 376. For example, in FIG. 12 , an insulator 380, an insulator 382, ​​and an insulator 384 are stacked in this order. A conductor 386 is formed in the insulators 380, 382, ​​and 384. The conductor 386 functions as a plug or wiring. The conductor 386 can be provided using the same material as the conductors 328 and 330. 【0164】 Note that, for example, the insulator 380 preferably uses an insulator having a barrier property against hydrogen, similar to the insulator 324. The conductor 386 preferably includes a conductor having a barrier property against hydrogen. In particular, a conductor having a barrier property against hydrogen is formed in an opening of the insulator 380 having a barrier property against hydrogen. With this structure, the transistor 550 and the transistor 500 can be separated by a barrier layer, and diffusion of hydrogen from the transistor 550 to the transistor 500 can be suppressed. 【0165】In the above, the wiring layer including the conductor 356, the wiring layer including the conductor 366, the wiring layer including the conductor 376, and the wiring layer including the conductor 386 have been described, but the semiconductor device according to this embodiment is not limited to this. There may be three or fewer wiring layers similar to the wiring layer including the conductor 356, or there may be five or more wiring layers similar to the wiring layer including the conductor 356. 【0166】 An insulator 510, an insulator 512, an insulator 514, and an insulator 516 are stacked in this order over the insulator 384. Any of the insulator 510, the insulator 512, the insulator 514, and the insulator 516 is preferably formed using a substance that has a barrier property against oxygen, hydrogen, and the like. 【0167】 For example, the insulator 510 and the insulator 514 are preferably formed using a film having a barrier property that prevents hydrogen, impurities, and the like from diffusing from the substrate 311 or a region where the transistor 550 is provided to a region where the transistor 500 is provided. Therefore, a material similar to that of the insulator 324 can be used. 【0168】 As an example of a film having a barrier property against hydrogen, silicon nitride formed by a CVD method can be used. Here, diffusion of hydrogen into a semiconductor element having an oxide semiconductor, such as the transistor 500, may degrade the characteristics of the semiconductor element. Therefore, a film that suppresses hydrogen diffusion is preferably used between the transistor 500 and the transistor 550. Specifically, the film that suppresses hydrogen diffusion is a film that releases a small amount of hydrogen. 【0169】 As a film having a barrier property against hydrogen, for example, the insulators 510 and 514 are preferably made of a metal oxide such as aluminum oxide, hafnium oxide, or tantalum oxide. 【0170】In particular, aluminum oxide has a high blocking effect of preventing the permeation of both oxygen and impurities such as hydrogen and moisture, which can cause fluctuations in the electrical characteristics of a transistor. Therefore, aluminum oxide can prevent impurities such as hydrogen and moisture from entering the transistor 500 during and after the transistor manufacturing process. Furthermore, aluminum oxide can suppress the release of oxygen from the oxide that constitutes the transistor 500. Therefore, aluminum oxide is suitable for use as a protective film for the transistor 500. 【0171】 For example, the insulator 512 and the insulator 516 can be made of a material similar to that of the insulator 320. By using a material with a relatively low dielectric constant for these insulators, parasitic capacitance between wirings can be reduced. For example, the insulators 512 and 516 can be made of a silicon oxide film or a silicon oxynitride film. 【0172】 A conductor 518, a conductor constituting the transistor 500 (for example, the conductor 503), and the like are embedded in the insulators 510, 512, 514, and 516. The conductor 518 functions as a plug or wiring connected to the capacitor 600 or the transistor 550. The conductor 518 can be formed using a material similar to that of the conductors 328 and 330. 【0173】 In particular, the conductor 518 in the region in contact with the insulator 510 and the insulator 514 is preferably a conductor having a barrier property against oxygen, hydrogen, and water. With this structure, the transistor 550 and the transistor 500 can be separated by a layer having a barrier property against oxygen, hydrogen, and water, and diffusion of hydrogen from the transistor 550 to the transistor 500 can be suppressed. 【0174】 Above the insulator 516, the transistor 500 is provided. 【0175】As shown in Figures 13A and 13B, transistor 500 has a conductor 503 arranged so as to be embedded in insulator 514 and insulator 516, an insulator 520 arranged on insulator 516 and conductor 503, an insulator 522 arranged on insulator 520, an insulator 524 arranged on insulator 522, an oxide 530a arranged on insulator 524, an oxide 530b arranged on oxide 530a, conductors 542a and 542b arranged apart from each other on oxide 530b, an insulator 580 arranged on conductors 542a and 542b and having an opening formed therein overlapping with conductors 542a and 542b, an insulator 545 arranged on the bottom and side surfaces of the opening, and a conductor 560 arranged on the surface on which insulator 545 is formed. 【0176】 13A and 13B, it is preferable that an insulator 544 be disposed between the oxide 530a, the oxide 530b, the conductor 542a, and the conductor 542b and the insulator 580. It is also preferable that the conductor 560 have a conductor 560a provided inside the insulator 545 and a conductor 560b provided so as to be embedded inside the conductor 560a. It is also preferable that an insulator 574 be disposed on the insulator 580, the conductor 560, and the insulator 545, as shown in FIG. 【0177】 In this specification and other documents, oxide 530a and oxide 530b may be collectively referred to as oxide 530. 【0178】 Although the transistor 500 has a two-layer structure of the oxide 530a and the oxide 530b in and around the channel formation region, the present invention is not limited to this structure. For example, the oxide 530b may be a single layer or a stack of three or more layers. 【0179】Although the transistor 500 has a two-layer structure in which the conductor 560 is stacked, the present invention is not limited to this. For example, the conductor 560 may have a single-layer structure or a stacked structure of three or more layers. The transistor 500 shown in FIGS. 12 and 13A is merely an example, and the present invention is not limited to this structure. An appropriate transistor may be used depending on the circuit configuration, driving method, and the like. 【0180】 Here, the conductor 560 functions as the gate electrode of the transistor, and the conductors 542a and 542b function as source and drain electrodes, respectively. As described above, the conductor 560 is formed so as to be embedded in the opening of the insulator 580 and in the region sandwiched between the conductors 542a and 542b. The arrangements of the conductors 560, 542a, and 542b are selected in a self-aligned manner with respect to the opening of the insulator 580. That is, in the transistor 500, the gate electrode can be arranged between the source and drain electrodes in a self-aligned manner. Therefore, the conductor 560 can be formed without providing a margin for alignment, thereby reducing the area occupied by the transistor 500. This allows for miniaturization and high integration of semiconductor devices. 【0181】 Furthermore, since the conductor 560 is formed in a self-aligned manner in the region between the conductor 542a and the conductor 542b, the conductor 560 does not have a region that overlaps with the conductor 542a or the conductor 542b. This reduces the parasitic capacitance formed between the conductor 560 and the conductor 542a and between the conductor 560 and the conductor 542b. This improves the switching speed of the transistor 500 and provides high frequency characteristics. 【0182】The conductor 560 may function as a first gate (also referred to as a top gate) electrode. The conductor 503 may function as a second gate (also referred to as a bottom gate) electrode. In this case, the threshold voltage of the transistor 500 can be controlled by changing the potential applied to the conductor 503 independently of the potential applied to the conductor 560. In particular, applying a negative potential to the conductor 503 can increase the threshold voltage of the transistor 500 above 0 V and reduce the off-state current. Therefore, applying a negative potential to the conductor 503 can reduce the drain current when the potential applied to the conductor 560 is 0 V compared to not applying a negative potential to the conductor 503. 【0183】 The conductor 503 is arranged to overlap the oxide 530 and the conductor 560. In this way, when a potential is applied to the conductor 560 and the conductor 503, the electric field generated from the conductor 560 and the electric field generated from the conductor 503 are connected, and a channel formation region formed in the oxide 530 can be covered. 【0184】 In this specification, etc., a transistor structure in which a channel formation region is electrically surrounded by the electric field of a first gate electrode is called a surrounded channel (S-channel) structure. The S-channel structure disclosed in this specification, etc., is different from a Fin structure and a planar structure. On the other hand, the S-channel structure disclosed in this specification, etc., can also be considered as a type of Fin structure. In this specification, etc., a Fin structure refers to a structure in which a gate electrode is disposed so as to surround at least two or more sides of the channel (specifically, two, three, or four sides, etc.). By employing the Fin structure and the S-channel structure, it is possible to improve resistance to the short channel effect, in other words, to obtain a transistor in which the short channel effect is less likely to occur. 【0185】By forming the transistor in the S-channel structure, the channel formation region can be electrically surrounded. Note that the S-channel structure electrically surrounds the channel formation region, and therefore, can be said to be substantially equivalent to a Gate All Around (GAA) structure or a Lateral Gate All Around (LGAA) structure. By forming the transistor in the S-channel structure, the GAA structure, or the LGAA structure, the channel formation region formed at or near the interface between the oxide 530 and the gate insulator can be the entire bulk of the oxide 530. Therefore, the current density flowing through the transistor can be improved, which is expected to improve the on-state current of the transistor or the field-effect mobility of the transistor. 【0186】 The conductor 503 has a structure similar to that of the conductor 518, in which the conductor 503a is formed in contact with the inner walls of the openings of the insulators 514 and 516, and the conductor 503b is formed on the conductor 503a so as to fill the openings. Note that although the transistor 500 shows a structure in which the conductors 503a and 503b are stacked, the present invention is not limited to this. For example, the conductor 503 may have a single layer structure or a stacked structure of three or more layers. 【0187】 Here, the conductor 503a is preferably made of a conductive material that has a function of suppressing the diffusion of impurities such as hydrogen atoms, hydrogen molecules, water molecules, and copper atoms (i.e., the impurities are less likely to permeate). Alternatively, it is preferably made of a conductive material that has a function of suppressing the diffusion of oxygen (e.g., at least one of oxygen atoms, oxygen molecules, etc.) (i.e., the oxygen is less likely to permeate). Note that in this specification, the function of suppressing the diffusion of impurities or oxygen refers to the function of suppressing the diffusion of any one or all of the impurities and oxygen. 【0188】 For example, the conductor 503a has a function of suppressing the diffusion of oxygen, which can suppress the conductor 503b from being oxidized and causing a decrease in conductivity. 【0189】Furthermore, when the conductor 503 also functions as a wiring, it is preferable that the conductor 503b be made of a highly conductive material containing tungsten, copper, or aluminum as a main component. Note that, although the conductor 503 is illustrated as a stack of the conductors 503a and 503b in this embodiment, the conductor 503 may have a single-layer structure. 【0190】 The insulators 520, 522, and 524 function as a second gate insulating film. 【0191】 Here, the insulator 524 in contact with the oxide 530 preferably contains more oxygen than the oxygen required for the stoichiometric composition. The oxygen is easily released from the film by heating. In this specification and elsewhere, oxygen released by heating may be referred to as "excess oxygen." That is, the insulator 524 preferably has a region containing excess oxygen (also referred to as an "excess oxygen region"). By providing such an insulator containing excess oxygen in contact with the oxide 530, oxygen vacancies (V O When hydrogen enters an oxygen vacancy in the oxide 530, the defect (hereinafter referred to as V O H.) may function as a donor and generate electrons as carriers. In addition, some of the hydrogen may bond with oxygen that is bonded to a metal atom to generate electrons as carriers. Therefore, a transistor using an oxide semiconductor containing a large amount of hydrogen is likely to have normally-on characteristics. Furthermore, hydrogen in an oxide semiconductor is easily moved by stress such as heat or an electric field. Therefore, if an oxide semiconductor contains a large amount of hydrogen, the reliability of the transistor may be deteriorated. In one embodiment of the present invention, V in the oxide 530 O It is preferable to reduce H as much as possible to obtain high-purity intrinsic or substantially high-purity intrinsic. OTo obtain an oxide semiconductor in which H is sufficiently reduced, it is important to remove impurities such as moisture and hydrogen from the oxide semiconductor (also referred to as "dehydration" or "dehydrogenation treatment") and to supply oxygen to the oxide semiconductor to compensate for oxygen vacancies (also referred to as "oxygenation treatment"). O When an oxide semiconductor in which impurities such as H are sufficiently reduced is used for a channel formation region of a transistor, stable electrical characteristics can be obtained. 【0192】 Specifically, it is preferable to use an oxide material from which part of the oxygen is released by heating as an insulator having an excess oxygen region. The oxide material from which oxygen is released by heating is an oxide material from which the amount of released oxygen, converted into oxygen atoms, is 1.0×10 in TDS (Thermal Desorption Spectroscopy) analysis. 18 atoms / cm 3 or more, preferably 1.0 × 10 19 atoms / cm 3 More preferably, 2.0 × 10 19 atoms / cm 3 or more, or 3.0 x 10 20 atoms / cm 3 The oxide film is one having the above-mentioned properties. The surface temperature of the film during the TDS analysis is preferably in the range of 100°C or higher and 700°C or lower, or 100°C or higher and 400°C or lower. 【0193】 Alternatively, the oxide 530 may be brought into contact with the insulator having the excess oxygen region and subjected to one or more of heat treatment, microwave treatment, and RF treatment. By performing such treatment, water or hydrogen in the oxide 530 can be removed. For example, a reaction occurs in the oxide 530 that breaks the V−H bond, in other words, the V O The reaction "H → Vo + H" occurs, and dehydrogenation can be achieved. Some of the hydrogen generated at this time combines with oxygen to form H 2 As O, it may be removed from the oxide 530 or from the insulators near the oxide 530. Also, some of the hydrogen may be gettered to the conductors 542a and 542b. 【0194】Furthermore, the microwave treatment is preferably performed using, for example, an apparatus having a power supply for generating high-density plasma or an apparatus having a power supply for applying RF to the substrate side. For example, high-density oxygen radicals can be generated by using a gas containing oxygen and high-density plasma, and by applying RF to the substrate side, the oxygen radicals generated by the high-density plasma can be efficiently introduced into the oxide 530 or an insulator near the oxide 530. The microwave treatment may be performed under a pressure of 133 Pa or more, preferably 200 Pa or more, and more preferably 400 Pa or more. The gases introduced into the microwave treatment apparatus may be, for example, oxygen and argon, with an oxygen flow ratio (O 2 / (O 2 +Ar)) is preferably 50% or less, and more preferably 10% or more and 30% or less. 【0195】 Furthermore, in the manufacturing process of the transistor 500, heat treatment is preferably performed with the surface of the oxide 530 exposed. The heat treatment may be performed, for example, at a temperature of 100° C. or higher and 450° C. or lower, more preferably 350° C. or higher and 400° C. or lower. Note that the heat treatment is performed in an atmosphere of nitrogen gas or an inert gas, or an atmosphere containing an oxidizing gas at 10 ppm or higher, 1% or higher, or 10% or higher. For example, the heat treatment is preferably performed in an oxygen atmosphere. This allows oxygen to be supplied to the oxide 530, thereby eliminating oxygen vacancies (V O ) can be reduced. The heat treatment may be performed under reduced pressure. Alternatively, the heat treatment may be performed in an atmosphere containing 10 ppm or more, 1% or more, or 10% or more of an oxidizing gas after the heat treatment in a nitrogen gas or inert gas atmosphere in order to compensate for the desorbed oxygen. Alternatively, the heat treatment may be performed in an atmosphere containing 10 ppm or more, 1% or more, or 10% or more of an oxidizing gas, and then the heat treatment may be performed in a nitrogen gas or inert gas atmosphere. 【0196】By performing oxygen addition treatment on the oxide 530, oxygen vacancies in the oxide 530 can be repaired by the supplied oxygen, in other words, the reaction of "Vo + O → null" can be promoted. Furthermore, the supplied oxygen reacts with hydrogen remaining in the oxide 530, converting the hydrogen into H 2 As a result, the hydrogen remaining in the oxide 530 is recombined with the oxygen vacancies to form V. O The formation of H can be suppressed. 【0197】 Furthermore, when the insulator 524 has an excess oxygen region, it is preferable that the insulator 522 has a function of suppressing the diffusion of oxygen (e.g., oxygen atoms, oxygen molecules, etc.) (that is, the insulator 524 is less likely to transmit oxygen). 【0198】 The insulator 522 preferably has a function of suppressing diffusion of oxygen, impurities, and the like, which prevents oxygen contained in the oxide 530 from diffusing toward the insulator 520. Furthermore, reaction of the conductor 503 with oxygen contained in the insulator 524, the oxide 530, and the like can be suppressed. 【0199】 The insulator 522 may be, for example, aluminum oxide, hafnium oxide, an oxide containing aluminum and hafnium (hafnium aluminate), tantalum oxide, zirconium oxide, lead zirconate titanate (PZT), strontium titanate (SrTiO 3 ), or (Ba,Sr)TiO 3 It is preferable to use an insulator containing a so-called high-k material such as BST in a single layer or a multilayer configuration. As transistors become smaller and more highly integrated, problems such as off-state current may arise due to thinner gate insulating films. By using a high-k material as the insulator that functions as the gate insulating film, it is possible to reduce the gate potential during transistor operation while maintaining the physical film thickness. 【0200】In particular, an insulator containing an oxide of one or both of aluminum and hafnium, which is an insulating material that has the function of suppressing the diffusion of impurities and oxygen (i.e., the oxygen is less likely to permeate), is preferably used. As an insulator containing an oxide of one or both of aluminum and hafnium, aluminum oxide, hafnium oxide, or an oxide containing aluminum and hafnium (hafnium aluminate) is preferably used. When the insulator 522 is formed using such a material, the insulator 522 functions as a layer that suppresses oxygen release from the oxide 530 or the intrusion of impurities such as hydrogen into the oxide 530 from the periphery of the transistor 500. 【0201】 Alternatively, for example, aluminum oxide, bismuth oxide, germanium oxide, niobium oxide, silicon oxide, titanium oxide, tungsten oxide, yttrium oxide, or zirconium oxide may be added to these insulators. Alternatively, these insulators may be nitrided. Silicon oxide, silicon oxynitride, or silicon nitride may be stacked on the above insulators. 【0202】 Furthermore, it is preferable that the insulator 520 be thermally stable. For example, silicon oxide and silicon oxynitride are suitable because they are thermally stable. Furthermore, by combining a high-k insulator with silicon oxide or silicon oxynitride, it is possible to obtain the insulator 520 having a thermally stable layered structure with a high dielectric constant. 【0203】 13A and 13B , the second gate insulating film has a three-layer structure including the insulators 520, 522, and 524. However, the second gate insulating film may have a single-layer structure, a two-layer structure, or a four- or more-layer structure. In this case, the second gate insulating film is not limited to a stack structure made of the same material, and may have a stack structure made of different materials. 【0204】 In the transistor 500, a metal oxide functioning as an oxide semiconductor is used for the oxide 530 including the channel formation region. 【0205】The metal oxide functioning as an oxide semiconductor may be formed by a sputtering method or an atomic layer deposition (ALD) method. Note that the metal oxide functioning as an oxide semiconductor will be described in detail in another embodiment. 【0206】 The metal oxide that functions as a channel formation region in the oxide 530 preferably has a band gap of 2 eV or more, preferably 2.5 eV or more. By using a metal oxide with a wide band gap in this manner, the off-state current of the transistor can be reduced. 【0207】 By having the oxide 530a below the oxide 530b, the oxide 530 can suppress the diffusion of impurities from components formed below the oxide 530a to the oxide 530b. 【0208】 The oxide 530 preferably has a configuration of multiple oxide layers with different atomic ratios of the metal atoms. Specifically, the atomic ratio of the element M among the constituent elements in the metal oxide used for the oxide 530a is preferably larger than the atomic ratio of the element M among the constituent elements in the metal oxide used for the oxide 530b. Furthermore, the atomic ratio of the element M to In in the metal oxide used for the oxide 530a is preferably larger than the atomic ratio of the element M to In in the metal oxide used for the oxide 530b. Furthermore, the atomic ratio of In to M in the metal oxide used for the oxide 530b is preferably larger than the atomic ratio of In to M in the metal oxide used for the oxide 530a. 【0209】 The oxide 530a preferably has a conduction band minimum energy higher than that of the oxide 530b, or in other words, the oxide 530a preferably has a lower electron affinity than that of the oxide 530b. 【0210】Here, the energy level of the conduction band minimum changes gradually at the junction between the oxides 530a and 530b. In other words, the energy level of the conduction band minimum at the junction between the oxides 530a and 530b changes continuously or forms a continuous junction. To achieve this, it is preferable to reduce the defect level density of the mixed layer formed at the interface between the oxides 530a and 530b. 【0211】 Specifically, when the oxide 530a and the oxide 530b have a common element (main component) other than oxygen, a mixed layer with a low density of defect states can be formed. For example, when the oxide 530b is an In—Ga—Zn oxide, an In—Ga—Zn oxide, a Ga—Zn oxide, or a gallium oxide can be used as the oxide 530a. 【0212】 In this case, the oxide 530b serves as the main carrier path. By configuring the oxide 530a as described above, the defect state density at the interface between the oxide 530a and the oxide 530b can be reduced. As a result, the influence of interface scattering on carrier conduction is reduced, and the transistor 500 can obtain a high on-state current. 【0213】Conductors 542a and 542b, which function as a source electrode and a drain electrode, are provided on the oxide 530b. The conductors 542a and 542b are preferably made of a metal element selected from aluminum, chromium, copper, silver, gold, platinum, tantalum, nickel, titanium, molybdenum, tungsten, hafnium, vanadium, niobium, manganese, magnesium, zirconium, beryllium, indium, ruthenium, iridium, strontium, and lanthanum, or alloys containing the above metal elements or alloys combining the above metal elements. For example, tantalum nitride, titanium nitride, tungsten, nitrides containing titanium and aluminum, nitrides containing tantalum and aluminum, ruthenium oxide, ruthenium nitride, oxides containing strontium and ruthenium, and oxides containing lanthanum and nickel are preferably used. In addition, tantalum nitride, titanium nitride, nitrides containing titanium and aluminum, nitrides containing tantalum and aluminum, ruthenium oxide, ruthenium nitride, oxides containing strontium and ruthenium, and oxides containing lanthanum and nickel are preferred because they are conductive materials that are resistant to oxidation or materials that maintain conductivity even when absorbing oxygen.Furthermore, metal nitride films such as tantalum nitride are preferred because they have barrier properties against hydrogen or oxygen. 【0214】 13A shows the conductor 542a and the conductor 542b as a single layer, they may be stacked with two or more layers. For example, a tantalum nitride film and a tungsten film may be stacked. Alternatively, a titanium film and an aluminum film may be stacked. Alternatively, a two-layer structure in which an aluminum film is stacked on a tungsten film, a two-layer structure in which a copper film is stacked on a copper-magnesium-aluminum alloy film, a two-layer structure in which a copper film is stacked on a titanium film, or a two-layer structure in which a copper film is stacked on a tungsten film may be used. 【0215】Other examples include a three-layer structure in which a titanium film or titanium nitride film is laminated on the titanium film or titanium nitride film, an aluminum film or copper film is laminated on the titanium film or titanium nitride film, and a titanium film or titanium nitride film is further formed thereon, and a three-layer structure in which a molybdenum film or molybdenum nitride film is laminated on the molybdenum film or molybdenum nitride film, an aluminum film or copper film is laminated on the molybdenum film or molybdenum nitride film, and a molybdenum film or molybdenum nitride film is further formed thereon. Note that a transparent conductive material containing indium oxide, tin oxide, or zinc oxide may also be used. 【0216】 13A , regions 543a and 543b may be formed as low-resistance regions at and near the interface of the oxide 530 with the conductor 542a (conductor 542b). In this case, the region 543a functions as one of the source and drain regions, and the region 543b functions as the other of the source and drain regions. A channel formation region is formed in the region sandwiched between the regions 543a and 543b. 【0217】 By providing the conductor 542a (conductor 542b) so as to be in contact with the oxide 530, the oxygen concentration in the region 543a (region 543b) may be reduced. Furthermore, a metal compound layer containing a metal contained in the conductor 542a (conductor 542b) and components of the oxide 530 may be formed in the region 543a (region 543b). In such a case, the carrier concentration in the region 543a (region 543b) increases, and the region 543a (region 543b) becomes a low-resistance region. 【0218】 The insulator 544 is provided to cover the conductors 542 a and 542 b and suppresses oxidation of the conductors 542 a and 542 b. In this case, the insulator 544 may be provided to cover the side surface of the oxide 530 and to be in contact with the insulator 524. 【0219】The insulator 544 can be a metal oxide containing one or more elements selected from hafnium, aluminum, gallium, yttrium, zirconium, tungsten, titanium, tantalum, nickel, germanium, neodymium, lanthanum, magnesium, etc. Alternatively, the insulator 544 can be silicon nitride oxide, silicon nitride, or the like. 【0220】 In particular, it is preferable to use, as the insulator 544, an insulator containing an oxide of either or both of aluminum and hafnium, such as aluminum oxide, hafnium oxide, or an oxide containing aluminum and hafnium (hafnium aluminate). Hafnium aluminate is particularly preferable because it has higher heat resistance than hafnium oxide film. Therefore, it is less likely to crystallize during heat treatment in a later process. Note that if the conductors 542a and 542b are made of a material that is resistant to oxidation or whose conductivity does not decrease significantly even when it absorbs oxygen, the insulator 544 is not an essential component. It may be designed appropriately depending on the desired transistor characteristics. 【0221】 The insulator 544 can prevent impurities such as water and hydrogen contained in the insulator 580 from diffusing into the oxide 530b. The insulator 544 can also prevent the conductors 542a and 542b from being oxidized by excess oxygen contained in the insulator 580. 【0222】 The insulator 545 functions as a first gate insulating film. Like the insulator 524, the insulator 545 is preferably formed using an insulator that contains excess oxygen and releases oxygen by heating. 【0223】 Specifically, silicon oxide having excess oxygen, silicon oxynitride, silicon nitride oxide, silicon nitride, silicon oxide to which fluorine is added, silicon oxide to which carbon is added, silicon oxide to which carbon and nitrogen are added, and silicon oxide having vacancies can be used. In particular, silicon oxide and silicon oxynitride are preferable because they are stable against heat. 【0224】By providing an insulator containing excess oxygen as the insulator 545, oxygen can be effectively supplied from the insulator 545 to the channel formation region of the oxide 530b. Similar to the insulator 524, the concentration of impurities such as water or hydrogen in the insulator 545 is preferably reduced. The thickness of the insulator 545 is preferably 1 nm to 20 nm. 【0225】 Furthermore, a metal oxide may be provided between the insulator 545 and the conductor 560 to efficiently supply excess oxygen contained in the insulator 545 to the oxide 530. The metal oxide preferably suppresses oxygen diffusion from the insulator 545 to the conductor 560. By providing a metal oxide that suppresses oxygen diffusion, the diffusion of excess oxygen from the insulator 545 to the conductor 560 is suppressed. In other words, a decrease in the amount of excess oxygen supplied to the oxide 530 can be suppressed. Furthermore, oxidation of the conductor 560 due to excess oxygen can be suppressed. As the metal oxide, a material that can be used for the insulator 544 may be used. 【0226】 Note that the insulator 545 may have a stacked structure, similar to the second gate insulating film. As transistors become smaller and more highly integrated, problems such as off-state current may occur due to thinner gate insulating films. Therefore, by using a stacked structure of a high-k material and a thermally stable material as the insulator that functions as the gate insulating film, it is possible to reduce the gate potential during transistor operation while maintaining the physical film thickness. Furthermore, a stacked structure that is thermally stable and has a high dielectric constant can be achieved. 【0227】 The conductor 560 functioning as the first gate electrode is shown as having a two-layer structure in FIGS. 13A and 13B, but may have a single-layer structure or a stacked structure of three or more layers. 【0228】 The conductor 560a is a material containing hydrogen atoms, hydrogen molecules, water molecules, nitrogen atoms, nitrogen molecules, and nitrogen oxide molecules (N 2 O, NO, NO 2It is preferable to use a conductive material that has the function of suppressing the diffusion of impurities such as copper atoms, etc., or that has the function of suppressing the diffusion of oxygen (e.g., at least one of oxygen atoms, oxygen molecules, etc.). The conductor 560a has the function of suppressing oxygen diffusion, which can suppress the oxidation of the conductor 560b due to the oxygen contained in the insulator 545, thereby preventing a decrease in conductivity. Examples of conductive materials that have the function of suppressing oxygen diffusion include tantalum, tantalum nitride, ruthenium, and ruthenium oxide. An oxide semiconductor that can be used for the oxide 530 can also be used as the conductor 560a. In this case, the conductor 560b can be formed by sputtering, thereby reducing the electrical resistance of the conductor 560a and making it a conductor. This can be called an OC (oxide conductor) electrode. 【0229】 The conductor 560b is preferably made of a conductive material containing tungsten, copper, or aluminum as a main component. Since the conductor 560b also functions as wiring, it is preferable to use a conductor with high conductivity. For example, a conductive material containing tungsten, copper, or aluminum as a main component can be used. The conductor 560b may have a layered structure, such as a layered structure of titanium or titanium nitride and the above-mentioned conductive material. 【0230】 The insulator 580 is provided over the conductor 542a and the conductor 542b with the insulator 544 interposed therebetween. The insulator 580 preferably has an excess oxygen region. For example, the insulator 580 preferably includes silicon oxide, silicon oxynitride, silicon nitride oxide, silicon nitride, silicon oxide doped with fluorine, silicon oxide doped with carbon, silicon oxide doped with carbon and nitrogen, silicon oxide having voids, or a resin. Silicon oxide and silicon oxynitride are particularly preferred because they are thermally stable. Silicon oxide and silicon oxide having voids are particularly preferred because they allow for easy formation of excess oxygen regions in a later step. 【0231】The insulator 580 preferably has an excess oxygen region. By providing the insulator 580 from which oxygen is released by heating, oxygen in the insulator 580 can be efficiently supplied to the oxide 530. Note that the concentration of impurities such as water or hydrogen in the insulator 580 is preferably reduced. 【0232】 The opening of the insulator 580 is formed to overlap the region between the conductors 542 a and 542 b, so that the conductor 560 is formed so as to be embedded in the opening of the insulator 580 and the region sandwiched between the conductors 542 a and 542 b. 【0233】 When miniaturizing semiconductor devices, it is necessary to shorten the gate length, but it is also necessary to ensure that the conductivity of the conductor 560 does not decrease. If the film thickness of the conductor 560 is increased for this purpose, the conductor 560 may have a shape with a high aspect ratio. In this embodiment, the conductor 560 is provided so as to be embedded in the opening of the insulator 580, and therefore, even if the conductor 560 has a shape with a high aspect ratio, the conductor 560 can be formed without collapsing during the process. 【0234】 The insulator 574 is preferably provided in contact with the top surface of the insulator 580, the top surface of the conductor 560, and the top surface of the insulator 545. By forming the insulator 574 by a sputtering method, excess oxygen regions can be provided in the insulator 545 and the insulator 580. This allows oxygen to be supplied from the excess oxygen regions into the oxide 530. 【0235】 For example, the insulator 574 can be a metal oxide containing one or more elements selected from hafnium, aluminum, gallium, yttrium, zirconium, tungsten, titanium, tantalum, nickel, germanium, or magnesium. 【0236】 In particular, aluminum oxide has high barrier properties and can suppress the diffusion of hydrogen and nitrogen even when it is a thin film with a thickness of 0.5 nm to 3.0 nm. Therefore, aluminum oxide formed by sputtering can function as both an oxygen source and a barrier film against impurities such as hydrogen. 【0237】 An insulator 581 functioning as an interlayer film is preferably provided over the insulator 574. Like the insulator 524, the insulator 581 preferably has a reduced concentration of impurities such as water or hydrogen. 【0238】 Furthermore, conductors 540a and 540b are arranged in openings formed in insulators 581, 574, 580, and 544. Conductor 540a and 540b are arranged opposite each other with conductor 560 interposed therebetween. Conductor 540a and 540b have the same configuration as conductors 546 and 548, which will be described later. 【0239】 An insulator 582 is provided over the insulator 581. The insulator 582 is preferably formed using a substance that has a barrier property against oxygen, hydrogen, and the like. Therefore, the insulator 582 can be formed using a material similar to that of the insulator 514. For example, the insulator 582 is preferably formed using a metal oxide such as aluminum oxide, hafnium oxide, or tantalum oxide. 【0240】 In particular, aluminum oxide has a high blocking effect of preventing the permeation of both oxygen and impurities such as hydrogen and moisture, which can cause fluctuations in the electrical characteristics of a transistor. Therefore, aluminum oxide can prevent impurities such as hydrogen and moisture from entering the transistor 500 during and after the transistor manufacturing process. Furthermore, aluminum oxide can suppress the release of oxygen from the oxide that constitutes the transistor 500. Therefore, aluminum oxide is suitable for use as a protective film for the transistor 500. 【0241】 An insulator 586 is provided over the insulator 582. The insulator 586 can be formed using a material similar to that of the insulator 320. By using a material with a relatively low dielectric constant for these insulators, parasitic capacitance between wirings can be reduced. For example, a silicon oxide film, a silicon oxynitride film, or the like can be used as the insulator 586. 【0242】Furthermore, conductors 546 and 548 are embedded in insulators 520, 522, 524, 544, 580, 574, 581, 582, and 586. 【0243】 The conductor 546 and the conductor 548 function as plugs or wirings that connect to the capacitor 600, the transistor 500, or the transistor 550. The conductor 546 and the conductor 548 can be formed using a material similar to that of the conductor 328 and the conductor 330. 【0244】 After the transistor 500 is formed, an opening may be formed to surround the transistor 500, and an insulator with high barrier properties against hydrogen or water may be formed to cover the opening. By surrounding the transistor 500 with the insulator with high barrier properties, it is possible to prevent moisture and hydrogen from entering from the outside. Alternatively, multiple transistors 500 may be collectively surrounded by an insulator with high barrier properties against hydrogen or water. When forming an opening to surround the transistor 500, for example, it is preferable to form an opening that reaches the insulator 522 or the insulator 514 and form the insulator with high barrier properties in contact with the insulator 522 or the insulator 514, because this can serve as part of the manufacturing process of the transistor 500. For example, the insulator with high barrier properties against hydrogen or water may be made of a material similar to that of the insulator 522 or the insulator 514. 【0245】 13A and 13B . For example, a transistor 500 having a structure shown in FIG. 14 may be used. The transistor 500 shown in FIG. 14 differs from the transistor shown in FIGS. 13A and 13B in that an insulator 555 is used and that the conductor 542a (conductors 542a1 and 542a2) and the conductor 542b (conductors 542b1 and 542b2) have a stacked structure. 【0246】The conductor 542a has a layered structure of a conductor 542a1 and a conductor 542a2 on the conductor 542a1, and the conductor 542b has a layered structure of a conductor 542b1 and a conductor 542b2 on the conductor 542b1. The conductors 542a1 and 542b1 in contact with the oxide 530b are preferably conductors that are resistant to oxidation, such as metal nitrides. This prevents the conductors 542a and 542b from being excessively oxidized by oxygen contained in the oxide 530b. Furthermore, the conductors 542a2 and 542b2 are preferably conductors such as metal layers that have higher conductivity than the conductors 542a1 and 542b1. This allows the conductors 542a and 542b to function as highly conductive wirings or electrodes. In this manner, a semiconductor device can be provided in which the conductors 542a and 542b functioning as wirings or electrodes are provided in contact with the top surface of the oxide 530 functioning as an active layer. 【0247】 For the conductors 542a1 and 542b1, it is preferable to use a metal nitride, such as a nitride containing tantalum, a nitride containing titanium, a nitride containing molybdenum, a nitride containing tungsten, a nitride containing tantalum and aluminum, or a nitride containing titanium and aluminum. In one embodiment of the present invention, a nitride containing tantalum is particularly preferable. Alternatively, for example, ruthenium, ruthenium oxide, ruthenium nitride, an oxide containing strontium and ruthenium, or an oxide containing lanthanum and nickel may be used. These materials are preferable because they are conductive materials that are resistant to oxidation or that maintain conductivity even when absorbing oxygen. 【0248】 Furthermore, the conductors 542a2 and 542b2 preferably have higher conductivity than the conductors 542a1 and 542b1. For example, the film thickness of the conductors 542a2 and 542b2 is preferably greater than the film thickness of the conductors 542a1 and 542b1. The conductors 542a2 and 542b2 may be made of a conductor that can be used for the conductor 560b. The above structure can reduce the resistance of the conductors 542a2 and 542b2. 【0249】 For example, tantalum nitride or titanium nitride can be used as the conductors 542a1 and 542b1, and tungsten can be used as the conductors 542a2 and 542b2. 【0250】 14 , in a cross-sectional view of the transistor 500 in the channel length direction, the distance between the conductor 542a1 and the conductor 542b1 is smaller than the distance between the conductor 542a2 and the conductor 542b2. This configuration makes it possible to further shorten the distance between the source and the drain, thereby shortening the channel length accordingly. This improves the frequency characteristics of the transistor 500. By miniaturizing the semiconductor device in this way, it is possible to provide a semiconductor device with improved operating speed. 【0251】 The insulator 555 is preferably an insulator that is resistant to oxidation, such as nitride. The insulator 555 is formed in contact with the side surfaces of the conductor 542a2 and the conductor 542b2 and functions to protect the conductors 542a2 and 542b2. Since the insulator 555 is exposed to an oxidizing atmosphere, an inorganic insulator that is resistant to oxidation is preferable. Furthermore, since the insulator 555 is in contact with the conductors 542a2 and 542b2, an inorganic insulator that is resistant to oxidation of the conductors 542a2 and 542b2 is preferable. Therefore, the insulator 555 is preferably an insulating material that has a barrier property against oxygen. For example, silicon nitride can be used as the insulator 555. 【0252】The transistor 500 shown in FIG. 14 is formed by forming openings in the insulator 580 and the insulator 544, forming an insulator 555 in contact with the sidewalls of the openings, and then separating the conductors 542a1 and 542b1 using a mask. The openings overlap with the regions between the conductors 542a2 and 542b2. Parts of the conductors 542a1 and 542b1 protrude into the openings. Therefore, the insulator 555 is in contact with the top surface of the conductor 542a1, the top surface of the conductor 542b1, the side surface of the conductor 542a2, and the side surface of the conductor 542b2 within the openings. The insulator 545 is in contact with the top surface of the oxide 530 in the region between the conductors 542a1 and 542b1. 【0253】 After separating the conductor 542a1 and the conductor 542b1, heat treatment is preferably performed in an oxygen-containing atmosphere before forming the insulator 545. This allows oxygen to be supplied to the oxide 530a and the oxide 530b, thereby reducing oxygen vacancies. Furthermore, the insulator 555 is formed in contact with the side surfaces of the conductor 542a2 and the conductor 542b2, which prevents the conductors 542a2 and 542b2 from being excessively oxidized. As a result, the electrical characteristics and reliability of the transistor can be improved. Furthermore, variation in the electrical characteristics of multiple transistors formed on the same substrate can be suppressed. 【0254】 14, the insulator 524 may be formed in an island shape in the transistor 500. Here, the insulator 524 may be formed so that its side edge is substantially aligned with that of the oxide 530. 【0255】 14, the transistor 500 may have a structure in which the insulator 522 is in contact with the insulator 516 and the conductor 503. In other words, a structure in which the insulator 520 shown in FIGS. 【0256】 Subsequently, a capacitor 600 is provided above the transistor 500. The capacitor 600 includes a conductor 610, a conductor 620, and an insulator 630. 【0257】A conductor 612 may be provided over the conductor 546 and the conductor 548. The conductor 612 functions as a plug or wiring connected to the transistor 500. The conductor 610 functions as an electrode of the capacitor 600. Note that the conductor 612 and the conductor 610 can be formed at the same time. 【0258】 A metal film containing an element selected from molybdenum, titanium, tantalum, tungsten, aluminum, copper, chromium, neodymium, and scandium, or a metal nitride film containing any of the above elements (tantalum nitride film, titanium nitride film, molybdenum nitride film, tungsten nitride film), or the like can be used for the conductor 612 and the conductor 610. Alternatively, a conductive material such as indium tin oxide, indium oxide containing tungsten oxide, indium zinc oxide containing tungsten oxide, indium oxide containing titanium oxide, indium tin oxide containing titanium oxide, indium zinc oxide, or indium tin oxide to which silicon oxide is added can also be used. 【0259】 In this embodiment, the conductor 612 and the conductor 610 have a single-layer structure, but the present invention is not limited to this structure and may have a stacked structure of two or more layers. For example, a conductor having a barrier property and a conductor having high adhesion to the conductor having high conductivity may be formed between a conductor having a barrier property and a conductor having high conductivity. 【0260】 The conductor 620 is provided so as to overlap with the conductor 610 with the insulator 630 interposed therebetween. Note that the conductor 620 can be formed using a conductive material such as a metal material, an alloy material, or a metal oxide material. It is preferable to use a high-melting-point material such as tungsten or molybdenum that has both heat resistance and conductivity, and tungsten is particularly preferable. Furthermore, when the conductor 620 is formed simultaneously with other components such as a conductor, a low-resistance metal material such as Cu (copper) or Al (aluminum) can be used. 【0261】 An insulator 640 is provided over the conductor 620 and the insulator 630. The insulator 640 can be provided using a material similar to that of the insulator 320. The insulator 640 may also function as a planarizing film that covers the uneven shape underneath. 【0262】 With this structure, miniaturization or high integration can be achieved in a semiconductor device including a transistor including an oxide semiconductor. 【0263】 Examples of a substrate that can be used for the semiconductor device of one embodiment of the present invention include a glass substrate, a quartz substrate, a sapphire substrate, a ceramic substrate, a metal substrate (e.g., a stainless steel substrate, a substrate having stainless steel foil, a tungsten substrate, a substrate having tungsten foil, etc.), a semiconductor substrate (e.g., a single crystal semiconductor substrate, a polycrystalline semiconductor substrate, or a compound semiconductor substrate), an SOI (Silicon on Insulator) substrate, and the like. A plastic substrate having heat resistance that can withstand the processing temperature of this embodiment may also be used. Examples of a glass substrate include barium borosilicate glass, aluminosilicate glass, aluminoborosilicate glass, and soda-lime glass. Alternatively, crystallized glass or the like can be used. 【0264】 Alternatively, a flexible substrate, a laminated film, paper containing a fibrous material, or a base film can be used as the substrate. Examples of flexible substrates, laminated films, and base films include the following: Plastics, such as polyethylene terephthalate (PET), polyethylene naphthalate (PEN), polyethersulfone (PES), and polytetrafluoroethylene (PTFE). Synthetic resins, such as acrylic, can be used. Polypropylene, polyester, polyvinyl fluoride, and polyvinyl chloride can be used. Polyamide, polyimide, aramid resin, epoxy resin, inorganic vapor-deposited film, and paper can be used. In particular, by manufacturing transistors using semiconductor substrates, single-crystal substrates, or SOI substrates, transistors with small size, high current capacity, and little variation in characteristics, size, or shape can be manufactured. Constructing a circuit using such transistors can reduce the power consumption of the circuit or increase the circuit integration. 【0265】 Alternatively, a flexible substrate may be used as the substrate, and transistors, resistors, and / or capacitors may be formed directly on the flexible substrate. Alternatively, a release layer may be provided between the substrate and the transistors, resistors, and / or capacitors. The release layer can be used to separate a semiconductor device, after a part or all of the semiconductor device is completed thereon, from the substrate and transfer it to another substrate. In this case, the transistors, resistors, and / or capacitors can be transferred to a substrate with poor heat resistance, a flexible substrate, or the like. The release layer may be, for example, a laminated structure of an inorganic film including a tungsten film and a silicon oxide film, a structure in which an organic resin film such as polyimide is formed on a substrate, or a silicon film containing hydrogen. 【0266】 That is, the semiconductor device may be formed on a certain substrate and then transferred to another substrate. Examples of substrates onto which the semiconductor device may be transferred include, in addition to the substrates on which the above-mentioned transistors can be formed, paper substrates, cellophane substrates, aramid film substrates, polyimide film substrates, stone substrates, wood substrates, cloth substrates (including natural fibers (silk, cotton, hemp), synthetic fibers (nylon, polyurethane, polyester), or recycled fibers (acetate, cupra, rayon, recycled polyester)), leather substrates, and rubber substrates. By using these substrates, it is possible to manufacture semiconductor devices that are flexible, durable, heat-resistant, lightweight, or thin. 【0267】 By providing a semiconductor device over a flexible substrate, an increase in weight can be suppressed and a semiconductor device that is less likely to be damaged can be provided. 【0268】 12 is just an example and is not limited to the structure thereof, and an appropriate transistor may be used depending on the circuit structure, driving method, etc. For example, when the semiconductor device is a unipolar circuit including only OS transistors (meaning transistors with the same polarity, such as only n-channel transistors), the structure of the transistor 550 may be the same as that of the transistor 500. 【0269】The configurations, structures, methods, and the like described in this embodiment can be used in appropriate combination with the configurations, structures, methods, and the like described in other embodiment modes and examples. 【0270】 This embodiment will describe an example of a cross-sectional structure of an element layer including stacked OS transistors that can be applied to a memory device, a data retention circuit, a memory circuit, etc. This embodiment will describe an example of a schematic cross-sectional view that can be applied to a circuit configuration such as a DOSRAM or a NOSRAM. 【0271】 15 shows a cross-sectional configuration example in the case where a DOSRAM circuit configuration is used. In FIG. 15, element layers 700[1] to 700[4] are stacked over an element layer 701. 【0272】 15 illustrates a transistor 550 included in the element layer 701. The transistor 550 described in the above embodiment can be used as the transistor 550. 【0273】 Note that the transistor 550 illustrated in FIG. 15 is just an example, and the structure is not limited thereto. An appropriate transistor may be used depending on the circuit configuration or the driving method. 【0274】 A wiring layer provided with an interlayer film, wiring, plugs, and the like may be provided between the element layer 701 and the element layer 700, or between the kth element layer 700 and the k+1th element layer 700. Note that in this embodiment and the like, the kth element layer 700 may be referred to as element layer 700[k], and the k+1th element layer 700 may be referred to as element layer 700[k+1]. Here, k is an integer of 1 to N. Furthermore, in this embodiment and the like, when "k+α (α is an integer of 1 or more)" or "k-α" is used, the solutions of "k+α" and "k-α" are integers of 1 to N, respectively. 【0275】 In addition, multiple wiring layers can be provided depending on the design. In addition, in this specification and the like, the wiring and the plug electrically connected to the wiring may be integrated. That is, there are cases where a part of the conductor functions as the wiring, and there are cases where a part of the conductor functions as the plug. 【0276】 For example, an insulator 320, an insulator 322, an insulator 324, and an insulator 326 are stacked in this order as an interlayer film over the transistor 550. A conductor 328 or the like is embedded in the insulators 320 and 322. A conductor 330 or the like is embedded in the insulators 324 and 326. The conductors 328 and 330 function as contact plugs or wirings. 【0277】 The insulator functioning as an interlayer film may also function as a planarizing film that covers the underlying unevenness. For example, the top surface of the insulator 320 may be planarized by a planarization process using a CMP method or the like to improve the planarity. 【0278】 A wiring layer may be provided over the insulator 326 and the conductor 330. For example, in FIG. 15 , an insulator 350, an insulator 357, an insulator 352, and an insulator 354 are stacked in this order over the insulator 326 and the conductor 330. A conductor 356 is formed in the insulator 350, the insulator 357, and the insulator 352. The conductor 356 functions as a contact plug or a wiring. 【0279】 The insulator 514 included in the element layer 700[1] is provided over the insulator 354. A conductor 358 is embedded in the insulator 514 and the insulator 354. The conductor 358 functions as a contact plug or a wiring. For example, the wiring BL and the transistor 550 are electrically connected through the conductor 358, the conductor 356, the conductor 330, and the like. 【0280】 Fig. 16A shows an example of a cross-sectional structure of the element layer 700[k]. Fig. 16B shows an equivalent circuit diagram of Fig. 16A. Fig. 16A shows an example in which two memory cells MC are electrically connected to one wiring BL. 【0281】 15 and 16A includes a transistor M1 and a capacitor C. The transistor M1 can be, for example, the transistor 500 described in the above embodiment. 【0282】Note that in this embodiment, the transistor M1 is a modified example of the transistor 500. Specifically, the transistor M1 differs from the transistor 500 in that the conductors 542a and 542b extend beyond the ends of the metal oxide 531 (the metal oxide 531a and the metal oxide 531b). 【0283】 15 and 16A includes a conductor 156 that functions as one terminal of a capacitance element C, an insulator 153 that functions as a dielectric, and a conductor 160 (conductor 160a and conductor 160b) that functions as the other terminal of the capacitance element C. The conductor 156 is electrically connected to a part of the conductor 542b. The conductor 160 is also electrically connected to a wiring PL (not shown in FIG. 16A). 【0284】 The capacitor C is formed in an opening provided by removing a part of the insulator 574, the insulator 580, and the insulator 554. Since the conductor 156, the insulator 580, and the insulator 554 are formed along the side surface of the opening, it is preferable to form the conductor 156, the insulator 580, and the insulator 554 by an ALD method, a CVD method, or the like. 【0285】 The conductor 156 and the conductor 160 may be made of a conductor that can be used for the conductor 505 or the conductor 560. For example, titanium nitride formed by an ALD method may be used as the conductor 156. Titanium nitride formed by an ALD method may be used as the conductor 160a, and tungsten formed by a CVD method may be used as the conductor 160b. Note that if the adhesion of tungsten to the insulator 153 is sufficiently high, a single layer film of tungsten formed by a CVD method may be used as the conductor 160. 【0286】It is preferable to use a high-dielectric constant (high-k) material (material with a high relative dielectric constant) for the insulator 153. For example, an oxide, oxynitride, oxynitride, or nitride containing one or more metal elements selected from aluminum, hafnium, zirconium, and gallium can be used as the high-dielectric constant insulator. Silicon may also be contained in the oxide, oxynitride, oxynitride, or nitride. Insulating layers made of the above materials may also be stacked. For example, the insulator 153 may have a three-layer stack structure of zirconium oxide, aluminum oxide, and zirconium oxide. The three-layer stack structure may be formed by stacking ZrO xa \AlO xb \ZrO xc (ZAZ). Note that the above xa, xb, and xc are each an arbitrary unit. 【0287】 For example, examples of the insulator made of a high-dielectric-constant material that can be used include aluminum oxide, hafnium oxide, zirconium oxide, an oxide containing aluminum and hafnium, an oxynitride containing aluminum and hafnium, an oxide containing silicon and hafnium, an oxynitride containing silicon and hafnium, an oxide containing silicon and zirconium, an oxynitride containing silicon and zirconium, an oxide containing hafnium and zirconium, and an oxynitride containing hafnium and zirconium. By using such a high-dielectric-constant material, the insulator 153 can be thickened to a degree that can suppress the off-current and can also ensure a sufficient capacitance of the capacitor C. 【0288】Furthermore, it is preferable to use a laminated insulating layer made of the above materials, and it is preferable to use a laminated structure of a high-dielectric-constant material and a material having a higher dielectric strength than the high-dielectric-constant material. For example, an insulating film formed by laminating zirconium oxide, aluminum oxide, and zirconium oxide in this order can be used as the insulator 153. Alternatively, it is possible to use an insulating film formed by laminating zirconium oxide, aluminum oxide, zirconium oxide, and aluminum oxide in this order. Alternatively, it is possible to use an insulating film formed by laminating hafnium zirconium oxide, aluminum oxide, hafnium zirconium oxide, and aluminum oxide in this order. By using a laminated insulator with a relatively high dielectric strength, such as aluminum oxide, the dielectric strength is improved, and electrostatic breakdown of the capacitance element C can be suppressed. 【0289】 Fig. 17 shows an example of a cross-sectional configuration when a circuit configuration of a NOSRAM memory cell is used. Fig. 17 is also a modified example of Fig. 15. Fig. 18A shows an example of a cross-sectional structure of an element layer 700[k]. Fig. 18B shows an equivalent circuit diagram of Fig. 18A. 【0290】 17 and 18A has a transistor M1, a transistor M2, and a transistor M3 on an insulator 514. In addition, a conductor 215 is provided on the insulator 514. The conductor 215 can be formed simultaneously with the conductor 505 using the same material and in the same process. 【0291】 17 and 18A share one island-shaped metal oxide 531. In other words, part of the island-shaped metal oxide 531 functions as a channel formation region for the transistor M2, and another part functions as a channel formation region for the transistor M3. The source of the transistor M2 and the drain of the transistor M3, or the drain of the transistor M2 and the source of the transistor M3, are shared. Therefore, the area occupied by the transistors M2 and M3 is smaller than when the transistors M2 and M3 are provided independently. 【0292】17 and 18A, an insulator 287 is provided on an insulator 581, and a conductor 161 is embedded in the insulator 287. An insulator 514 of an element layer 700[k+1] is provided on the insulator 287 and the conductor 161. 【0293】 17 and 18A , the conductor 215 of the element layer 700[k+1] functions as one terminal of the capacitance element C, the insulator 514 of the element layer 700[k+1] functions as a dielectric of the capacitance element C, and the conductor 161 functions as the other terminal of the capacitance element C. In addition, the other of the source and drain of the transistor M1 is electrically connected to the conductor 161 via a contact plug, and the gate of the transistor M2 is electrically connected to the conductor 161 via another contact plug. 【0294】 This embodiment mode can be implemented in appropriate combination with other embodiment modes described in this specification. 【0295】 In this embodiment, a transistor having an oxide semiconductor in a channel formation region (OS transistor) will be described. Note that in the description of the OS transistor, a comparison with a transistor having silicon in a channel formation region (also referred to as a Si transistor) will also be briefly described. 【0296】 [OS Transistor] An OS transistor is preferably formed using an oxide semiconductor with a low carrier concentration. For example, the carrier concentration of a channel formation region of an oxide semiconductor is preferably 1×10 18 cm −3 Below 1 × 10, preferably 17 cm −3 less than 1×10 16 cm −3 less than 1×10 13 cm −3 less than 1×10 10 cm −3 is less than 1×10 −9 cm −3The above is the case. Note that in order to reduce the carrier concentration of an oxide semiconductor film, the impurity concentration in the oxide semiconductor film may be reduced to reduce the density of defect states. In this specification and the like, a semiconductor having a low impurity concentration and a low density of defect states is referred to as a highly purified intrinsic or substantially highly purified intrinsic oxide semiconductor. Note that an oxide semiconductor having a low carrier concentration may also be referred to as a highly purified intrinsic or substantially highly purified intrinsic oxide semiconductor. 【0297】 Furthermore, a highly purified intrinsic or substantially highly purified intrinsic oxide semiconductor may have a low density of trap states due to a low density of defect states. Charges trapped in trap states of the oxide semiconductor take a long time to disappear and may behave like fixed charges. Therefore, a transistor in which a channel formation region is formed in an oxide semiconductor with a high density of trap states may have unstable electrical characteristics. 【0298】 Therefore, reducing the impurity concentration in the oxide semiconductor is effective for stabilizing the electrical characteristics of a transistor. Furthermore, in order to reduce the impurity concentration in the oxide semiconductor, it is preferable to also reduce the impurity concentration in adjacent films. Examples of impurities include hydrogen and nitrogen. Note that the impurities in the oxide semiconductor refer to, for example, elements other than the main components constituting the oxide semiconductor. For example, an element with a concentration of less than 0.1 atomic % can be considered an impurity. 【0299】 Furthermore, when impurities and oxygen vacancies exist in a channel formation region of an oxide semiconductor, the electrical characteristics of an OS transistor are likely to fluctuate, and reliability may be reduced. O H) and generate electrons that become carriers. OWhen H is formed, the donor concentration in the channel formation region may increase. As the donor concentration in the channel formation region increases, the threshold voltage may vary. Therefore, if oxygen vacancies are present in the channel formation region of an oxide semiconductor, the transistor is likely to have normally-on characteristics (characteristics in which a channel exists and current flows through the transistor even when no voltage is applied to the gate electrode). Therefore, in the channel formation region of an oxide semiconductor, impurities, oxygen vacancies, and V O It is preferable that H is reduced as much as possible. 【0300】 The band gap of the oxide semiconductor is preferably larger than that of silicon (typically 1.1 eV), preferably 2 eV or more, more preferably 2.5 eV or more, and further preferably 3.0 eV or more. By using an oxide semiconductor having a band gap larger than that of silicon, the off-state current (also referred to as Ioff) of the transistor can be reduced. 【0301】 Furthermore, as the size of Si transistors is reduced, a short channel effect (also referred to as SCE) occurs. This makes it difficult to reduce the size of Si transistors. One of the reasons for the short channel effect is the small band gap of silicon. On the other hand, an OS transistor uses an oxide semiconductor, which is a semiconductor material with a wide band gap, and therefore the short channel effect can be suppressed. In other words, an OS transistor is a transistor that does not have the short channel effect or has an extremely small short channel effect. 【0302】 The short-channel effect is a degradation of electrical characteristics that becomes apparent as transistors are miniaturized (channel lengths are reduced). Specific examples of the short-channel effect include a decrease in threshold voltage, an increase in subthreshold swing (sometimes referred to as S value), and an increase in leakage current. Here, the S value refers to the amount of change in gate voltage in the subthreshold region that changes the drain current by one order of magnitude at a constant drain voltage. 【0303】Furthermore, the characteristic length is widely used as an index of resistance to the short channel effect. The characteristic length is an index of how easily the potential in the channel formation region bends. The smaller the characteristic length, the steeper the potential rises, and therefore the more resistant it is to the short channel effect. 【0304】 An OS transistor is an accumulation-mode transistor, while a Si transistor is an inversion-mode transistor. Therefore, compared with a Si transistor, an OS transistor has a smaller characteristic length between a source region and a channel formation region and a smaller characteristic length between a drain region and a channel formation region. Therefore, an OS transistor is more resistant to the short-channel effect than a Si transistor. That is, when a transistor with a short channel length is to be manufactured, an OS transistor is more suitable than a Si transistor. 【0305】 Even when the carrier concentration of the oxide semiconductor is reduced to the point where the channel formation region becomes i-type or substantially i-type, the conduction band minimum of the channel formation region in a short-channel transistor is lowered due to the conduction-band-lowering (CBL) effect, and therefore the energy difference between the conduction band minimums of the source or drain region and the channel formation region can be reduced to 0.1 eV or more and 0.2 eV or less. − The source and drain regions are n-type regions. + The region of type n + / n − / n + an accumulation-type junction-less transistor structure, or + / n − / n + This can also be regarded as an accumulation type non-junction transistor structure. 【0306】By using an OS transistor with the above structure, good electrical characteristics can be obtained even when a semiconductor device is miniaturized or highly integrated. For example, good electrical characteristics can be obtained even when the gate length of an OS transistor is 20 nm or less, 15 nm or less, 10 nm or less, 7 nm or less, or 6 nm or less, or 1 nm or more, 3 nm or more, or 5 nm or more. On the other hand, a Si transistor may have difficulty achieving a gate length of 20 nm or less or 15 nm or less due to the short-channel effect. Therefore, an OS transistor can be suitably used as a transistor having a shorter channel length than a Si transistor. Note that the gate length refers to the length of a gate electrode in the direction in which carriers move inside a channel formation region during transistor operation, and refers to the width of the bottom surface of the gate electrode in a plan view of the transistor. 【0307】 Furthermore, miniaturization of an OS transistor can improve the high-frequency characteristics of the transistor. Specifically, the cutoff frequency of the transistor can be improved. When the gate length of an OS transistor is within the above range, the cutoff frequency of the transistor can be set to, for example, 50 GHz or higher, preferably 100 GHz or higher, and further preferably 150 GHz or higher at room temperature. 【0308】 As described above, compared to Si transistors, OS transistors have excellent advantages such as a smaller off-state current and the ability to be manufactured as transistors with a short channel length. 【0309】 The structures, configurations, methods, and the like described in this embodiment can be used in appropriate combination with structures, configurations, methods, and the like described in other embodiments. 【0310】 In this embodiment, electronic components, electronic devices, mainframes, space equipment, and data centers (also referred to as data centers (DCs)) that can use the semiconductor device described in the above embodiment will be described. The electronic components, electronic devices, mainframes, space equipment, and data centers that use the semiconductor device of one embodiment of the present invention are effective in achieving high performance, such as low power consumption. 【0311】 [Electronic Component] FIG. 19A shows a perspective view of a substrate (mounting substrate 704) on which an electronic component 709 is mounted. The electronic component 709 shown in FIG. 19A has a semiconductor device 710 inside a mold 711. FIG. 19A omits some parts in order to show the interior of the electronic component 709. The electronic component 709 has lands 712 on the outside of the mold 711. The lands 712 are electrically connected to electrode pads 713, and the electrode pads 713 are electrically connected to the semiconductor device 710 via wires 714. The electronic component 709 is mounted on, for example, a printed circuit board 702. A plurality of such electronic components are combined and electrically connected on the printed circuit board 702 to complete the mounting substrate 704. 【0312】 The semiconductor device 710 also includes a drive circuit layer 715 and an element layer 716. The element layer 716 has a configuration in which a plurality of memory cell arrays are stacked. The stacked configuration of the drive circuit layer 715 and the element layer 716 can be a monolithic stacked configuration. In a monolithic stacked configuration, the layers can be connected without using through-electrode technology such as a TSV (Through Silicon Via) or a bonding technology such as Cu-Cu direct bonding. By configuring the drive circuit layer 715 and the element layer 716 as a monolithic stacked configuration, for example, a so-called on-chip memory configuration can be achieved in which the memory is formed directly on the processor. The on-chip memory configuration enables the operation of the interface between the processor and the memory to be faster. 【0313】 Furthermore, by configuring an on-chip memory, it is possible to reduce the size of connection wiring, etc., compared to technologies that use through electrodes such as TSVs, and therefore it is possible to increase the number of connection pins. Increasing the number of connection pins enables parallel operation, which makes it possible to improve the memory bandwidth (also called memory bandwidth). 【0314】It is also preferable that the plurality of memory cell arrays included in the element layer 716 be formed using OS transistors and that the plurality of memory cell arrays be monolithically stacked. By forming the plurality of memory cell arrays in a monolithic stacked structure, it is possible to improve either or both of the memory bandwidth and the memory access latency. Note that the bandwidth is the amount of data transferred per unit time, and the access latency is the time from access to the start of data exchange. Note that when Si transistors are used for the element layer 716, it is more difficult to form a monolithic stacked structure than when OS transistors are used. Therefore, it can be said that OS transistors have a superior structure to Si transistors in a monolithic stacked structure. 【0315】 The semiconductor device 710 may also be referred to as a die. In this specification, a die refers to a chip piece obtained by forming a circuit pattern on, for example, a disk-shaped substrate (also called a wafer) and dicing it into cubes during the semiconductor chip manufacturing process. Examples of semiconductor materials that can be used for the die include silicon (Si), silicon carbide (SiC), and gallium nitride (GaN). For example, a die obtained from a silicon substrate (also called a silicon wafer) may be called a silicon die. 【0316】 19B shows a perspective view of an electronic component 730. The electronic component 730 is an example of a SiP (System in Package) or an MCM (Multi-Chip Module). The electronic component 730 has an interposer 731 provided on a package substrate 732 (printed circuit board), and a semiconductor device 735 and a plurality of semiconductor devices 710 provided on the interposer 731. 【0317】The electronic component 730 shows an example in which the semiconductor device 710 is used as a high bandwidth memory (HBM). The semiconductor device 735 can be used in an integrated circuit such as a central processing unit (CPU), a graphics processing unit (GPU), or a field programmable gate array (FPGA). 【0318】 For example, a ceramic substrate, a plastic substrate, or a glass epoxy substrate can be used as the package substrate 732. For example, a silicon interposer or a resin interposer can be used as the interposer 731. 【0319】 The interposer 731 has multiple wirings and functions to electrically connect multiple integrated circuits with different terminal pitches. The multiple wirings are provided in a single layer or multiple layers. The interposer 731 also functions to electrically connect the integrated circuits provided on the interposer 731 to electrodes provided on the package substrate 732. For these reasons, the interposer is sometimes called a "rewiring substrate" or "intermediate substrate." In addition, through electrodes may be provided in the interposer 731, and the integrated circuits and the package substrate 732 may be electrically connected using the through electrodes. In addition, with a silicon interposer, a TSV may also be used as the through electrode. 【0320】 In an HBM, many wirings must be connected to achieve a wide memory bandwidth. Therefore, the interposer on which the HBM is mounted must have fine and high-density wiring. Therefore, it is preferable to use a silicon interposer for the interposer on which the HBM is mounted. 【0321】Furthermore, in SiPs, MCMs, and the like that use silicon interposers, a decrease in reliability due to differences in the coefficient of expansion between the integrated circuit and the interposer is unlikely to occur. Furthermore, because the silicon interposer has a highly flat surface, poor connection between the integrated circuit mounted on the silicon interposer and the silicon interposer is unlikely to occur. In particular, it is preferable to use silicon interposers in 2.5D packages (2.5-dimensional packaging) in which multiple integrated circuits are arranged horizontally on an interposer. 【0322】 On the other hand, when electrically connecting multiple integrated circuits with different terminal pitches using a silicon interposer, a TSV, or the like, a space is required, such as the width of the terminal pitch. Therefore, when attempting to reduce the size of the electronic component 730, the width of the terminal pitch becomes an issue, and it may be difficult to provide the many wirings necessary to achieve a wide memory bandwidth. Therefore, as described above, a monolithic stacked structure using OS transistors is preferable. A composite structure may be formed by combining a memory cell array stacked using TSVs and a monolithically stacked memory cell array. 【0323】 A heat sink (heat dissipation plate) may be provided overlapping the electronic component 730. When a heat sink is provided, it is preferable to align the height of an integrated circuit provided on the interposer 731. For example, in the electronic component 730 shown in this embodiment, it is preferable to align the height of the semiconductor device 710 and the height of the semiconductor device 735. 【0324】 Electrodes 733 may be provided on the bottom of the package substrate 732 in order to mount the electronic component 730 on another substrate. FIG. 19B shows an example in which the electrodes 733 are formed of solder balls. By providing solder balls in a matrix on the bottom of the package substrate 732, BGA (Ball Grid Array) mounting can be achieved. Alternatively, the electrodes 733 may be formed of conductive pins. By providing conductive pins in a matrix on the bottom of the package substrate 732, PGA (Pin Grid Array) mounting can be achieved. 【0325】The electronic component 730 can be mounted on other substrates using various mounting methods, including, but not limited to, BGA and PGA, such as a staggered pin grid array (SPGA), a land grid array (LGA), a quad flat package (QFP), a quad flat J-leaded package (QFJ), and a quad flat non-leaded package (QFN). 【0326】 [Electronic Device] Next, a perspective view of an electronic device 6500 is shown in FIG. 20A . The electronic device 6500 shown in FIG. 20A is a portable information terminal that can be used as a smartphone. The electronic device 6500 includes a housing 6501, a display portion 6502, a power button 6503, a button 6504, a speaker 6505, a microphone 6506, a camera 6507, a light source 6508, a control device 6509, and the like. Note that the control device 6509 includes, for example, one or more selected from a CPU, a GPU, and a memory device. The semiconductor device of one embodiment of the present invention can be applied to the display portion 6502, the control device 6509, and the like. 【0327】 20B is an information terminal that can be used as a laptop personal computer. The electronic device 6600 includes a housing 6611, a keyboard 6612, a pointing device 6613, an external connection port 6614, a display portion 6615, a control device 6616, and the like. Note that the control device 6616 includes, for example, one or more selected from a CPU, a GPU, and a memory device. The semiconductor device of one embodiment of the present invention can be applied to the display portion 6615, the control device 6616, and the like. Note that the use of the semiconductor device of one embodiment of the present invention in the control device 6509 and the control device 6616 is preferable because power consumption can be reduced. 【0328】 [Mainframe] Next, Fig. 20C shows a perspective view of a mainframe 5600. The mainframe 5600 shown in Fig. 20C has a rack 5610 housing a plurality of rack-mounted computers 5620. The mainframe 5600 may also be called a supercomputer. 【0329】 The computer 5620 can have, for example, the configuration shown in the perspective view in Fig. 20D. In Fig. 20D, the computer 5620 has a motherboard 5630, which has a plurality of slots 5631 and a plurality of connection terminals. A PC card 5621 is inserted into the slot 5631. In addition, the PC card 5621 has connection terminals 5623, 5624, and 5625, which are each connected to the motherboard 5630. 【0330】 PC card 5621 shown in Figure 20E is an example of a processing board equipped with a CPU, GPU, storage device, etc. PC card 5621 has board 5622. Board 5622 also has connection terminal 5623, connection terminal 5624, connection terminal 5625, semiconductor device 5626, semiconductor device 5627, semiconductor device 5628, and connection terminal 5629. Note that Figure 20E illustrates semiconductor devices other than semiconductor device 5626, semiconductor device 5627, and semiconductor device 5628, but for these semiconductor devices, please refer to the descriptions of semiconductor device 5626, semiconductor device 5627, and semiconductor device 5628 described below. 【0331】 The connection terminal 5629 has a shape that allows it to be inserted into a slot 5631 of the motherboard 5630, and the connection terminal 5629 functions as an interface for connecting the PC card 5621 and the motherboard 5630. An example of the standard for the connection terminal 5629 is PCIe. 【0332】The connection terminals 5623, 5624, and 5625 can be, for example, interfaces for supplying power to the PC card 5621, inputting signals, etc. Furthermore, they can be, for example, interfaces for outputting signals calculated by the PC card 5621. Examples of standards for the connection terminals 5623, 5624, and 5625 include USB (Universal Serial Bus), SATA (Serial ATA), and SCSI (Small Computer System Interface). Furthermore, when a video signal is output from the connection terminals 5623, 5624, and 5625, examples of the respective standards include HDMI (registered trademark). 【0333】 The semiconductor device 5626 has a terminal (not shown) for inputting and outputting signals, and the semiconductor device 5626 and the board 5622 can be electrically connected by inserting the terminal into a socket (not shown) provided on the board 5622. 【0334】 The semiconductor device 5627 has a plurality of terminals, and the semiconductor device 5627 can be electrically connected to the board 5622 by, for example, reflow soldering the terminals to wiring provided on the board 5622. Examples of the semiconductor device 5627 include an FPGA, a GPU, and a CPU. For example, the electronic component 730 can be used as the semiconductor device 5627. 【0335】 The semiconductor device 5628 has a plurality of terminals, and the semiconductor device 5628 can be electrically connected to the board 5622 by, for example, reflow soldering the terminals to wiring provided on the board 5622. Examples of the semiconductor device 5628 include a memory device. For example, the electronic component 709 can be used as the semiconductor device 5628. 【0336】 The mainframe computer 5600 can also function as a parallel computer. By using the mainframe computer 5600 as a parallel computer, it is possible to perform large-scale calculations required for, for example, learning and inference in artificial intelligence. 【0337】 [Space Equipment] The semiconductor device of one embodiment of the present invention can be suitably used in space equipment such as equipment for processing and storing information. 【0338】 The semiconductor device of one embodiment of the present invention can include an OS transistor. The OS transistor exhibits small changes in electrical characteristics due to radiation exposure. That is, the OS transistor has high radiation resistance and can be suitably used in an environment where radiation may be incident. For example, the OS transistor can be suitably used in outer space. 【0339】 Fig. 21 shows an artificial satellite 6800 as an example of space equipment. The artificial satellite 6800 has a body 6801, a solar panel 6802, an antenna 6803, a secondary battery 6805, and a control device 6807. In Fig. 21, a planet 6804 is shown in outer space. Note that outer space refers to an altitude of 100 km or higher, for example, but the outer space described in this specification may also include the thermosphere, mesosphere, and stratosphere. 【0340】 21 , a battery management system (also referred to as a BMS) or a battery control circuit may be provided for the secondary battery 6805. The use of an OS transistor in the battery management system or the battery control circuit is preferable because it consumes low power and has high reliability even in space. 【0341】 Furthermore, outer space is an environment with radiation levels 100 times higher than on Earth. Examples of radiation include electromagnetic waves (electromagnetic radiation) such as X-rays and gamma rays, and particle radiation such as alpha rays, beta rays, neutron rays, proton rays, heavy ion rays, and meson rays. 【0342】When sunlight is irradiated onto the solar panel 6802, the power required for the operation of the satellite 6800 is generated. However, for example, in a situation where sunlight is not irradiated onto the solar panel or where the amount of sunlight irradiating the solar panel is small, the generated power is small. Therefore, there is a possibility that the power required for the operation of the satellite 6800 will not be generated. In order to operate the satellite 6800 even in a situation where the generated power is small, it is preferable to provide a secondary battery 6805 on the satellite 6800. Note that the solar panel may be called a solar cell module. 【0343】 The satellite 6800 can generate a signal. The signal is transmitted via an antenna 6803, and can be received by, for example, a receiver installed on the ground or another satellite. By receiving the signal transmitted by the satellite 6800, the position of the receiver that received the signal can be determined. As described above, the satellite 6800 can constitute a satellite positioning system. 【0344】 The control device 6807 has a function of controlling the artificial satellite 6800. The control device 6807 is configured using, for example, one or more selected from a CPU, a GPU, and a storage device. Note that the semiconductor device of one embodiment of the present invention is preferably used for the control device 6807. An OS transistor has smaller fluctuations in electrical characteristics due to radiation exposure than a Si transistor. That is, an OS transistor has high reliability even in an environment where radiation may be incident, and can be preferably used. 【0345】 The artificial satellite 6800 can also be configured to include a sensor. For example, by including a visible light sensor, the artificial satellite 6800 can have the function of detecting sunlight reflected from an object on the ground. Alternatively, by including a thermal infrared sensor, the artificial satellite 6800 can have the function of detecting thermal infrared rays emitted from the earth's surface. As described above, the artificial satellite 6800 can function as, for example, an earth observation satellite. 【0346】Although an artificial satellite is described as an example of space equipment in this embodiment, the present invention is not limited thereto. For example, the semiconductor device of one embodiment of the present invention can be suitably used in space equipment such as a spaceship, a space capsule, or a space probe. 【0347】 As described above, OS transistors have excellent advantages over Si transistors, such as the ability to achieve a wide memory bandwidth and high radiation resistance. 【0348】 [Data Center] The semiconductor device of one embodiment of the present invention can be suitably used in a storage system applied to, for example, a data center. The data center is required to perform long-term management of data, such as ensuring data immutability. To manage long-term data, the building must be large enough to install storage and servers for storing a huge amount of data, to ensure a stable power supply for maintaining the data, or to ensure cooling equipment required for maintaining the data. 【0349】 By using the semiconductor device of one embodiment of the present invention in a storage system applied to a data center, it is possible to reduce the power required to store data and the size of the semiconductor device that stores data. Therefore, it is possible to reduce the size of the storage system, the size of the power supply for storing data, the scale of the cooling equipment, and the like. Therefore, it is possible to reduce the space required for the data center. 【0350】 Furthermore, the semiconductor device of one embodiment of the present invention consumes less power, which allows heat generation from the circuit to be reduced. Therefore, adverse effects of the heat generation on the circuit itself, peripheral circuits, and modules can be reduced. Furthermore, by using the semiconductor device of one embodiment of the present invention, a data center that operates stably even in a high-temperature environment can be realized. Therefore, the reliability of the data center can be improved. 【0351】Fig. 22 shows a storage system applicable to a data center. The storage system 7000 shown in Fig. 22 has a plurality of servers 7001sb as hosts 7001 (illustrated as Host Computers). It also has a plurality of storage devices 7003md as storage 7003 (illustrated as Storage). The host 7001 and storage 7003 are shown connected via a storage area network 7004 (illustrated as SAN: Storage Area Network) and a storage control circuit 7002 (illustrated as Storage Controller). 【0352】 The host 7001 corresponds to a computer that accesses data stored in the storage 7003. The hosts 7001 may be connected to each other via a network. 【0353】 Although the storage 7003 uses flash memory to reduce the data access speed, i.e., the time required to store and output data, this time is significantly longer than the time required for DRAM, which can be used as cache memory within the storage. In order to solve the problem of the long access speed of the storage 7003, a storage system typically provides cache memory within the storage to reduce the time required to store and output data. 【0354】 The above-mentioned cache memory is used in the storage control circuit 7002 and the storage 7003. Data exchanged between the host 7001 and the storage 7003 is stored in the cache memory in the storage control circuit 7002 and the storage 7003, and then output to the host 7001 or the storage 7003. 【0355】 By using OS transistors as transistors for storing data in the cache memory and holding a potential corresponding to the data, the frequency of refresh operations can be reduced, and power consumption can be reduced. 【0356】Note that the application of the semiconductor device of one embodiment of the present invention to any one or more selected from electronic components, electronic devices, mainframe computers, space equipment, and data centers is expected to have an effect of reducing power consumption. Therefore, while energy demand is expected to increase with the improvement in performance or high integration of semiconductor devices, the use of the semiconductor device of one embodiment of the present invention can contribute to the reduction of carbon dioxide (CO 2 Furthermore, the semiconductor device of one embodiment of the present invention is effective as a countermeasure against global warming because it consumes low power. 【0357】 The structures, configurations, methods, and the like described in this embodiment can be used in appropriate combination with structures, configurations, methods, and the like described in other embodiments. 【0358】 A semiconductor device including a CPU corresponding to the arithmetic device 100 described in Embodiment 1 and an accelerator corresponding to the arithmetic device 200 was fabricated by using a technique for stacking an element layer (also referred to as an OS layer) including a transistor (IGZO-FET) using a crystalline In-Ga-Zn-Oxide semiconductor in a semiconductor layer. The fabricated semiconductor device also includes, as other components, a power supply circuit, a CPU memory for storing CPU data, and the like. The CPU memory corresponds to the memory device 300 described in Embodiment 1. 【0359】 The prototype semiconductor device was fabricated by a process in which two OS layers, which are element layers of IGZO-FETs fabricated by 200 nm technology, were stacked on a Si CMOS circuit fabricated by 130 nm technology. 【0360】FIG. 23 is a schematic diagram illustrating the chip appearance of a prototype semiconductor device 10X. In FIG. 23, an OS layer is partially provided on an element layer 20 on which a Si CMOS circuit is provided. The CPU illustrated in FIG. 23 includes an OS flip-flop OSFF having data retention circuits (hereinafter referred to as backup memories) FD1 and FD2 stacked on a scan flip-flop SFF provided in the element layer 20. The accelerator ACC illustrated in FIG. 23 includes multiple blocks each including a multiply-accumulate processing element (hereinafter also referred to as an arithmetic element PE) provided in the element layer 20 and ACC memories MB1 and MB2 stacked on the arithmetic element PE. The element layer 20 also includes a CPU memory MEM on which an OS layer is stacked, and a power supply circuit PC. 【0361】 24 is a schematic diagram illustrating bank switching of OS flip-flops (OSFFs) and bank switching of processing elements PE. Bank switching of the OS flip-flops (OSFFs) is performed by switching data read from backup memories FD1 and FD2 provided on scan flip-flops SFFs. Bank switching of the processing elements PE is performed by switching data read from ACC memories MB1 and MB2 provided on the processing elements PEs. In FIG. 24, the backup memories FD1 and FD2 and the ACC memories MB1 and MB2 are provided in OS layers OS1 and OS2, and the scan flip-flops SFFs and processing elements PE are provided in an element layer Si having a Si CMOS circuit. 【0362】 Bank switching can be performed by switching between two states, Context 0 and Context 1 (Context Switch). In Context 0, data is read from the backup memory FD1 and ACC memory MB1 in the OS layer OS1 to the scan flip-flops SFFs and the arithmetic elements PE. In Context 1, data is read from the backup memory FD2 and ACC memory MB2 in the OS layer OS2 to the scan flip-flops SFFs and the arithmetic elements PE. 【0363】25 shows the system configuration of a prototype semiconductor device 10X. The prototype semiconductor device 10X includes an ARM Cortex-MO CPU (CORE), an 8-kByte CPU memory (MEM), an accelerator (ACC), a power supply circuit (PC), a power management circuit (PMU), a general purpose IO (GPIO), an external memory IF (External Memory Interface, ExMIF), a bus bridge (BB), a watchdog (WD), and serial communication interfaces (SPI, UART). Each circuit is electrically connected via an AHB bus (AHB lite), an APB bus (APB), etc. 【0364】 The accelerator (ACC) is an AI accelerator configuration in which a memory (ACC memory) for the weight data of the artificial neural network (NN) is provided on the processing element (PE) (Figure 26). Due to the trade-off between reducing the driver area and improving latency that occurs when the number of memory block divisions is small or large, the processing element PE is arranged in blocks, with eight blocks (blocks) sharing two layers of 4 KB memory per 16 processing elements (PEs). The processing element PE is configured to be able to switch between two states (Context0, Context1) by storing different weight data (NN1, NN2) in two NOS RAMs by stacking an OS layer. 【0365】The accelerator (ACC) supports a binary neural network (BNN) for low-power operation. It includes a controller with a built-in mechanism for changing the number of parallel processing elements (PE) driven according to the neural network, a memory / AI mode switching function, and a serializer-deserializer (SerDes). The weight data (W[7:0]) and input data (A[7:0]) are input to the XNOR of the PE. The weight data is read from the ACC memories MB1 and MB2 via the driver circuit (R / W DRV). The counter (Popcount) counts the data in the XNOR and adds it to the data in the accumulator (Reg.). Eight multiply-accumulate (MAC) operations are executed in parallel in one clock cycle, and the results are temporarily stored in an accumulator (Reg.), resulting in the multiply-accumulate data (ACC[10:0]). This process is repeated according to the number of inputs (neurons), and then threshold processing (bias data T[10:0]) is performed to complete the calculation for one layer of the network. The bias data is read from ACC memories MB1 and MB2 via a driver circuit (R / W DRV). Up to 128 of these processing elements PE can be driven in parallel. In the case of a fully connected network with three hidden layers, inference is possible in 194 clock cycles. 【0366】 The OS layer containing the ACC memory to be accessed can be selected by a layer select driver LSD made solely of OS transistors (Figure 27). The layer select driver LSD is configured with a bootstrap circuit to suppress threshold voltage drops in the word line (RWL, WWL) voltage caused by n-channel transistor (nMOS) switches. Because the layer select driver LSD and memory cells of the ACC memory MB1 and MB2 can be simultaneously implemented in the OS layer, no area overhead occurs even if the number of stacked layers increases. Furthermore, there is no need to change the address size of the driver circuit (R / W DRV) made of Si-CMOS, and the area and power consumption do not increase. 【0367】The CPU is a normally-off CPU with power gating capability. The CPU core is a Cortex-M0 (registered trademark) manufactured by ARM. The backup memory is placed directly above the scan flip-flops (SFFs), and each OS layer is stacked with zero area overhead. Taking advantage of the characteristics of monolithic stacking, fine-grained and random placement is possible. By stacking the OS layer, the backup memory can store different data in two backup memories, and is configured to be switchable between two states (Context 0 and Context 1). 【0368】 The OS flip-flop (OSFF) places a 3T1C / unit memory directly above the scan flip-flop SFF, and each OS layer is stacked with zero area overhead (FIG. 28). The scan flip-flop SFF has a flip-flop (FF). Taking advantage of the characteristics of monolithic stacking, fine-grained and random placement is possible. Data can be backed up and restored between the 3T1C / unit memory and the scan flip-flop SFF. 【0369】 Fig. 29 shows a timing chart for explaining the operation of the accelerator (ACC) shown in Fig. 27 and the OS flip-flop (OSFF) shown in Fig. 28 when switching between Context0 and Context1. Fig. 29 also shows a timing chart for explaining the operation of a signal (PG_EN) for power gating (PG) by the power management unit (PMU). 【0370】In the OS flip-flop (OSFF), data is saved by a signal BK[0] (BK[1]) to the memory of the first (second) OS layer corresponding to Context0 (Context1), and the data is written back to the scan flip-flop SFF by a signal RE[1] (RE[0]) corresponding to the next Context1 (Context0). The signal BK[1] (BK[0]) backs up tasks and results, realizing context switching. After saving the data, PG is possible by transitioning to sleep mode. Chip evaluation confirmed that 4,045 scan flip-flops SFF were backed up and restored in a batch at 160 ns and 180 ns, consuming 510 fJ / bit and 111 fJ / bit, respectively. 【0371】 The ACC memories MB1 and MB2 in the accelerator ACC allow context switching simply by switching the layer selection signal. When one of the OS layers is selected, activating the read word line (RWL) with the CMOS driver allows access to memory cells in the ACC memories MB1 and MB2 in the row of the corresponding OS layer. During PG, data is held by the memory cells in the ACC memories MB1 and MB2, so no special operation is required. 【0372】 The signal waveforms of the prototype semiconductor device 10X were confirmed. As shown in FIG. 30 , the waveforms of the switching between OS1 and OS2, the signals BK[0], BK[1], and the signals RE[0], RE[1], which accompany the context switching, were confirmed. 【0373】 FIG. 31 is a diagram illustrating the state of calculations when the processing elements PE are driven in parallel in the accelerator (ACC). The calculations were multiply-accumulate (MAC), threshold processing (TH), and output (OUT) in each layer (PL1 to PL4). The HCLK was set to 10 MHz, and the PECLK (access clock) was set to 400 kHz. For a fully connected network with a 784-layer input layer (PL1) and three hidden layers (PL2 to PL4: 128 layers), inference was possible in 194 clocks. 【0374】The results of the chip evaluation are shown in a graph in FIG. 32, with the left vertical axis representing computing efficiency, the right vertical axis representing classification accuracy, and the horizontal axis representing access clock frequency. As shown in FIG. 32, the condition for high classification accuracy and a high access clock frequency for the accelerator ACC was 4.44 TOPS / W (PECLK (access clock frequency) 400 kHz, system clock frequency 10 MHz). Memory reading for inference is the critical path, and inference accuracy decreases at the maximum frequency (400 kHz), but there is room for performance improvement through memory optimization. 【0375】 FIG. 33A is a graph comparing the energy of inference using only CPU memory and cores (CORE) (using the MNIST database) with inference using the accelerator ACC. FIG. 33A is a graph with energy on the vertical axis. While the energy of inference using only CPU memory and cores (CORE) was 1681.97 μJ, the energy of inference using the accelerator ACC was reduced to 0.19 μJ. FIG. 33B is a graph with execution time on the vertical axis. The inference execution time was also reduced from 3.55 s to 485 μs (FIG. 33B). As a result, it was confirmed that inference can be performed in accordance with the frame rate of the imaging data (e.g., 60 fps, 16 ms). 【0376】FIG. 34 is a schematic diagram comparing the effect of reducing power consumption when performing context switching and power gating (PG) on a chip of this embodiment having two OS layers (OS / OS / Si (OS Memory) configuration), a chip with an OS / Si (OS Memory) configuration having one OS layer, and a chip with a Si (SRAM) configuration without an OS layer. FIG. 34 is a graph with power on the vertical axis and time on the horizontal axis. The OS / Si chip is a chip with only one layer of OS memory stacked on a CMOS circuit. The Si (SRAM) chip is a chip that does not use an OS and has an accelerator configured with SRAM. Since SRAM is a volatile memory, PG is not possible, so the comparison was made using a configuration that reduces standby power using clock gating (CG). 【0377】 Two neural networks (NN1, NN2) are switched to perform inference (using the MNIST database) (Active period), and then PG (CG) is performed (Standby period). Power consumption is estimated using intermittent operation as an example. 【0378】 Both the OS / Si configuration chip and the chip with the accelerator configured in Si (SRAM) configuration (estimated by the SRAM generator) can only store data for one neural network in memory. Therefore, the weight data W must be rewritten each time an inference is performed. Specifically, in the case of the Si (SRAM) configuration and the OS / Si (OS Memory) configuration, the weight data W of neural network NN1 is stored (Store W NN1) and an inference (Inference NN1) is performed. Next, the weight data W of neural network NN2 is stored (Store W NN2) and an inference (Inference NN2) is performed, and this process is repeated. 【0379】On the other hand, in a stacked OS / OS / Si configuration, quick context switching can be realized (instant context switching), and power consumption can be reduced by securing PG time. Specifically, in the case of an OS / OS / Si (OS Memory) configuration, it is possible to perform inference by switching the weight data W of the neural networks NN1 and NN2, so that inference (Inference NN1) and inference (Inference NN2) can be performed consecutively. 【0380】 FIG. 35 is a schematic diagram comparing the operations of accelerators in the OS / OS / Si configuration, the OS / Si configuration, and the Si (SRAM) configuration when a context switch is executed, in relation to FIG. 34 . 【0381】 As shown in Fig. 35, in the case of the Si (SRAM) configuration and the OS / Si configuration, the weight data W of the neural network NN1 is stored in the SRAM or OS Mem. (Store W for NN1), and the processing elements PEs perform inference (Inference NN1). Subsequently, the weight data W of the neural network NN2 is stored in the SRAM or OS Mem. (Store W for NN2), and inference (Inference NN2) is performed, and this process is repeated. 【0382】 On the other hand, in the case of a stacked OS / OS / Si configuration, it is possible to store the weight data W of the neural networks NN1 and NN2 in the second layer of OS Mem. (Store W), and perform inference by switching the data in the OS Mem. (Inference NN1, Inference NN2). Therefore, it is possible to perform inference (Inference NN1) and inference (Inference NN2) consecutively. 【0383】Figure 36A shows the results of power measurements for a chip with an OS / OS / Si configuration and two OS layers, during inference using the accelerator ACC (ACC Inference), ACC memory write (ACC Memory Write), and PG, with the vertical axis representing power. Figure 36A also shows a breakdown of power consumption for the CORE, PMU, ACC, and Other (Other). Figure 36B also shows the percentage (Percentage) of power consumption for a chip with an OS / OS / Si configuration and two OS layers, during inference using the accelerator ACC, ACC memory write, and PG, with the vertical axis representing power. 【0384】 36A and 36B, the power consumption during inference using the accelerator ACC, during ACC memory writing, and during PG was 386.5 μW, 637.4 μW, and 0.89 μW, respectively. Assuming inference at a frame rate of 60 fps, the average power consumption of this chip is 25.15 μW, which is a 79% reduction in power compared to the Si (SRAM) configuration. 【0385】 37A is a graph showing the relationship between frequency (intermittent operation cycle: horizontal axis) and power consumption (power: vertical axis) when switching between two-layer neural networks (2NN) for accelerators with OS / OS / Si configuration, OS / Si configuration, and Si (SRAM) configuration. It was found that when switching between two-layer neural networks, power consumption in the OS / OS / Si configuration can be reduced. 【0386】37B is a graph showing the relationship between frequency (intermittent operation cycle: horizontal axis) and power consumption (power: vertical axis) when switching between four-layer neural networks (4NN) for accelerators with OS / OS / OS / OS / Si, OS / OS / Si, OS / Si, and Si (SRAM) configurations. When switching between four-layer neural networks, the OS / OS / Si configuration offers little power consumption reduction. The power consumption reduction effect can be enhanced by configuring the number of OS layers to correspond to the number of neural network layers. 【0387】 37C is a graph comparing the power consumption (Power@16ms: vertical axis) when switching between two-layer neural networks (2NN) and four-layer neural networks (4NN) for accelerators with OS / OS / OS / OS / OS / Si, OS / OS / Si, OS / Si, and Si (SRAM) configurations, respectively, every 16 ms. As can be seen from Fig. 37C, the number of OS layers can be adjusted according to the number of neural network layers, thereby significantly reducing power consumption. 【0388】FIG. 38A is a graph illustrating the relationship between a configuration having an OS layer corresponding to the number of neural networks (1, 2, 4, or 8 networks) (Number of OS Layer (OS / OS / Si: OS Memory)) and the accelerator block size (ACC Block Size). Similarly, FIG. 38B is a graph illustrating the relationship between a configuration having an OS layer corresponding to the number of neural networks (1, 2, 4, or 8 networks) and standby power during PG (Stand-by Power). Similarly, FIG. 38C is a graph illustrating the relationship between a configuration having an OS layer corresponding to the number of neural networks (1, 2, 4, or 8 networks) and power consumption during operation (Active Power). 38A to 38C also show the block size, standby power, and power consumption of the accelerator when the number of neural networks is increased in a Si (SRAM) configuration (Address Size expansion rate (Si:SRAM)) without an OS layer. 【0389】 As shown in Figures 38A to 38C, in accelerator configurations with OS layers corresponding to the number of neural networks (1, 2, 4, or 8 networks), the block size remains unchanged even when the number of OS layers is increased according to the number of neural networks. The same applies to standby power and power consumption. In the Si (SRAM) configuration, the block size, power consumption, and standby power increase as the number of neural networks increases. In terms of power consumption, the Si (SRAM) configuration is advantageous when the number of neural networks is small. 【0390】 From the above, by using the memory in the OS layer to perform bank switching, eliminating the need to rewrite the ACC memory due to context switching, and by extending the execution time of the PG as a result, even when memory is provided in an OS / OS / Si configuration with two OS layers, benefits are obtained in terms of both power and area, demonstrating the effectiveness of this system. 【0391】FIG. 39 shows a top view photograph of the die, and FIG. 40 shows a cross-sectional photograph. In FIG. 40, the S / D Electrode, Top Gate, and Back Gate are illustrated as the source electrode / drain electrode, gate electrode, and back gate electrode. The semiconductor device described in this example was fabricated by a process in which two IGZO-FET element layers fabricated using 200 nm technology were stacked on a Si CMOS circuit fabricated using 130 nm technology. The OS layer can be used as backup memory, ACC memory, and CPU memory, with each memory layer (OS memory) corresponding to a bank. In the system proposed with this configuration, bank switching of the ACC memory and bank switching of the backup memory are linked, and the inference of different neural networks can be switched with low latency and low power, thereby extending the waiting time for power gating. 【0392】 <Additional Notes Regarding the Description of the Present Specification, etc.> The following additional notes will be given regarding the above-described embodiments and the explanations of the respective configurations in the embodiments. 【0393】 The configurations shown in each embodiment can be combined with the configurations shown in other embodiments as appropriate to form one aspect of the present invention. In addition, when multiple configuration examples are shown in one embodiment, the configuration examples can be combined as appropriate. 【0394】 In addition, the content (or even a part of the content) described in one embodiment can be applied to, combined with, or replaced with another content (or even a part of the content) described in that embodiment, and / or the content (or even a part of the content) described in one or more other embodiments. 【0395】 The contents described in the embodiments refer to the contents described in each embodiment using various figures or the contents described using text in the specification. 【0396】Furthermore, a figure (or even a part thereof) described in one embodiment can be combined with another part of that figure, another figure (or even a part thereof) described in that embodiment, and / or a figure (or even a part thereof) described in one or more other embodiments to form even more figures. 【0397】 In addition, in the present specification and the like, in the block diagrams, components are classified by function and shown as independent blocks. However, in actual circuits, etc., it is difficult to separate components by function, and there may be cases where one circuit is involved in multiple functions, or where one function is involved across multiple circuits. Therefore, the blocks in the block diagrams are not limited to the components described in the specification, but may be rephrased appropriately depending on the situation. 【0398】 In addition, in the drawings, the size, layer thickness, or region is shown at an arbitrary size for convenience of explanation. Therefore, it is not necessarily limited to the scale. Note that the drawings are shown schematically for clarity, and are not limited to the shapes or values ​​shown in the drawings. For example, it is possible to include variations in signal, voltage, or current due to noise, or variations in signal, voltage, or current due to timing deviations. 【0399】 In this specification and the like, when describing the connection relationship of a transistor, the terms "one of the source or drain" (or first electrode or first terminal) and "the other of the source or drain" (or second electrode or second terminal) are used. This is because the source and drain of a transistor vary depending on the structure or operating conditions of the transistor. Note that the source and drain of a transistor can be appropriately referred to as source (drain) terminal, source (drain) electrode, or the like depending on the situation. 【0400】Furthermore, the terms "electrode" and "wiring" used in this specification and the like do not limit the functionality of these components. For example, an "electrode" may be used as part of a "wiring," and vice versa. Furthermore, the terms "electrode" and "wiring" also include cases where multiple "electrodes" or "wirings" are integrally formed. 【0401】 Furthermore, in this specification and the like, the terms voltage and potential can be interchanged as appropriate. Voltage refers to the potential difference from a reference potential. For example, if the reference potential is a ground voltage (earth voltage), then voltage can be interchanged with potential. Ground potential does not necessarily mean 0 V. Note that potential is relative, and the potential applied to wiring, etc. may change depending on the reference potential. 【0402】 In this specification and the like, terms such as "film" and "layer" can be interchangeable depending on the circumstances. For example, the term "conductive layer" can be changed to the term "conductive film." Or, for example, the term "insulating film" can be changed to the term "insulating layer." 【0403】 In this specification, a switch refers to a device that has a function of controlling whether a current flows by being in a conductive state (on state) or a non-conductive state (off state), or a device that has a function of selecting and switching a path for a current to flow. 【0404】 In this specification, the channel length refers to, for example, in a top view of a transistor, a region where a semiconductor (or a portion in the semiconductor through which current flows when the transistor is on) and a gate overlap, or a distance between a source and a drain in a region where a channel is formed. 【0405】 In this specification, the channel width refers to, for example, the length of the region where the semiconductor (or the portion in the semiconductor through which current flows when the transistor is on) and the gate electrode overlap, or the length of the portion where the source and drain face each other in the region where the channel is formed. 【0406】 In this specification and the like, a node can be referred to as a terminal, a wiring, an electrode, a conductive layer, a conductor, an impurity region, etc. depending on the circuit configuration, device structure, etc. Also, a terminal, a wiring, etc. can be referred to as a node. 【0407】 In this specification, "A and B are connected" refers to an electrical connection between A and B. Here, "A and B are electrically connected" refers to a connection in which an electrical signal can be transmitted between A and B when an object (such as a switch, transistor element, or diode, or a circuit including such an object and wiring) is present between A and B. Note that "A and B are electrically connected" also includes a case in which A and B are directly connected. Here, "A and B are directly connected" refers to a connection in which an electrical signal can be transmitted between A and B via wiring (or electrodes) or the like, without passing through the object. In other words, a direct connection refers to a connection that can be regarded as the same circuit diagram when represented by an equivalent circuit. 【0408】 10: semiconductor device, 20: element layer, 21: transistor, 22: semiconductor layer, 30: element layer, 31: transistor, 32: semiconductor layer, 100: arithmetic unit, 110: register, 120: scan flip-flop, 121: selector, 122: flip-flop, 130: data retention circuit, 132: transistor, 133: transistor, 134: transistor, 135: capacitor, 200: arithmetic unit, 210: memory circuit, 211: arithmetic circuit, 220: layer selection circuit, 221: write word line driver unit, 230: layer selection circuit, 231: read word line driver unit, 241: read circuit, 300: memory device, 310: storage layer

Claims

[Claim 1] It comprises a first arithmetic unit having registers, and a second arithmetic unit having memory circuits, layer selection circuits, and arithmetic circuits, The first and second arithmetic units are provided on an element layer in which a plurality of second element layers are stacked on a first element layer. The first element layer is provided with a first transistor having silicon in a semiconductor layer having a channel formation region. The second element layer is provided with a second transistor having an oxide semiconductor in a semiconductor layer having a channel formation region. The register includes a scan flip-flop and a data holding circuit. The scan flip-flop and the arithmetic circuit are provided in the first element layer, The data holding circuit is provided in each of the multiple second element layers located on the first element layer on which the scan flip-flop is provided. The memory circuit and the layer selection circuit are provided in each of the multiple second element layers located on the first element layer on which the arithmetic circuit is provided. A semiconductor device in which the data input to the aforementioned arithmetic circuit is switched by the aforementioned layer selection circuit. [Claim 2] In claim 1, The input terminal of the scan flip-flop is electrically connected to each of the output terminals of the plurality of data holding circuits, and the output terminal of the scan flip-flop is electrically connected to each of the input terminals of the data holding circuits. The data holding circuit is a semiconductor device that has the function of holding data corresponding to a task performed by the first arithmetic unit by keeping the second transistor in a non-conductive state. [Claim 3] In claim 1, The memory circuit has memory cells electrically connected to the write word line and the read word line. The layer selection circuit is a semiconductor device having the function of outputting signals to be supplied to the write word line and the read word line. [Claim 4] In claim 1, Each of the memory circuits provided on different second element layers has weight data used for computational processing based on a neural network, The weight data input to the calculation circuit is switched by the layer selection circuit, a semiconductor device. [Claim 5] In claim 1, The data holding circuit is a semiconductor device having a region that overlaps with the scan flip-flop in a plan view. [Claim 6] In claim 1, The memory circuit is a semiconductor device having an area that overlaps with the arithmetic circuit in a plan view. [Claim 7] In claim 1, The oxide semiconductor is a semiconductor device having In, Ga, and Zn. [Claim 8] In claim 1, An arithmetic circuit is a semiconductor device that has the function of performing multiply-accumulate operations.