Production support equipment
The production support device addresses bias in learned models by randomly extracting training data for machine learning, enhancing inference accuracy and optimizing component type pairs and feeder pairs in a production system.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- FUJI CORP
- Filing Date
- 2022-10-14
- Publication Date
- 2026-06-25
Smart Images

Figure 0007880431000001 
Figure 0007880431000002 
Figure 0007880431000003
Abstract
Description
Technical Field
[0001] This specification relates to a production support device.
Background Art
[0002] Conventionally, for example, a state determination device and a state determination method disclosed in Patent Document 1 (hereinafter referred to as "conventional state determination device, etc.") are known. In the conventional state determination device, etc., a learned model to which learning data classified based on classification conditions is applied is determined, and machine learning is performed by applying the classified learning data to the determined learned model.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In the conventional state determination device, etc., for a learned model selectively determined from a plurality of learned models, specific classified learning data is applied and machine learning is performed. For this reason, in the conventional state determination device, etc., the learned model can output a desired inference result for data similar to the specific classified learning data. When machine learning is performed using only specific classified learning data as in the conventional state determination device, etc., there is a bias in the learning data used for learning, and as a result, a learned model specialized in the classification criteria for classifying the learning data is generated over time.
[0005] An object of this specification is to provide a production support device that can use a learned model generated by suppressing the bias of learning data.
Means for Solving the Problems
[0006] This specification discloses a production support device comprising: a first training data acquisition unit that acquires a plurality of first training data used for first machine learning regarding component type pairs for which the mounting process can be improved and rewards can be obtained by exchanging component type pairs of components to be mounted on a substrate between a plurality of component mounting machines and trying the mounting process; a training data storage unit that classifies and stores the acquired plurality of first training data according to predetermined classification criteria; an extraction unit that randomly extracts each of the first training data classified and stored in the training data storage unit; and a trained model storage unit that stores a trained model generated by performing first machine learning using the randomly extracted first training data.
[0007] This specification also discloses the technical idea of changing "the production support device described in claim 1 or 2" to "the production support device described in any one of claims 1-3" in claim 4 of the original application. Furthermore, this specification also discloses the technical idea of changing "the production support device described in claim 1 or 2" to "the production support device described in any one of claims 1-4" in claim 5 of the original application. Furthermore, this specification also discloses the technical idea of changing "the production support device described in claim 1 or 2" to "the production support device described in any one of claims 1-6" in claim 7 of the original application. Furthermore, this specification also discloses the technical idea of changing "the production support device described in claim 1 or 2" to "the production support device described in any one of claims 1-13" in claim 14 of the original application. Moreover, this specification also discloses the technical idea of changing "the production support device described in claim 1 or 2" to "the production support device described in any one of claims 1-15" in claim 16 of the original application. [Effects of the Invention]
[0008] According to the production support device, by randomly extracting each of the classified and stored first training data and performing first machine learning, it is possible to use a trained model that has been generated while suppressing bias in the training data. [Brief explanation of the drawing]
[0009] [Figure 1] This diagram shows the overall configuration of the production system. [Figure 2] This diagram illustrates the multiple component mounting machines that make up the production system shown in Figure 1. [Figure 3] Figure 2 is a schematic diagram showing the overall configuration of the component mounting machine. [Figure 4] Figure 1 is a schematic side view showing the main components of the feeder. [Figure 5] This is a schematic top view of the carrier tape. [Figure 6] This is a functional block diagram showing the configuration of the production support system. [Figure 7] This is a functional block diagram showing the configuration of the learning phase by the production support device (trained model generation unit). [Figure 8] This is a functional block diagram showing the configuration of the inference phase by the production support device (inference unit). [Figure 9] This flowchart shows the optimization program executed by the production support device. [Modes for carrying out the invention]
[0010] The production support device will be described below with reference to the drawings. In this embodiment, the production support device will be described as being provided in a production system in which a feeder is transported to a parts mounting machine by an automatic conveyor.
[0011] 1. Overall configuration of production system 1 First, the overall configuration of production system 1 will be described with reference to Figures 1, 2, and 3. Production system 1 comprises a plurality of component mounting machines 10 (four in this embodiment) arranged in the width direction, an automatic transport machine 20, a loader device 30, a feeder 40, and a production support device 100. The component mounting machine 10 is a substrate mounting machine that performs a mounting operation in which components P (for example, electronic components) are mounted onto a substrate K as a predetermined operation.
[0012] In the production system 1 formed by multiple component mounting machines 10, substrates K are sequentially transported into each component mounting machine 10, and a mounting process is performed in each component mounting machine 10 to mount predetermined components. In the following description, the X-axis direction is defined as the left-right direction (width direction) of the component mounting machine 10, the Y-axis direction is defined as the front-back direction (depth direction) of the component mounting machine 10, and the Z-axis direction is defined as the up-down direction (vertical direction) of the component mounting machine 10.
[0013] Furthermore, the production system 1 is equipped with an automatic transporter 20 for transporting and attaching (replacing) feeders 40 to each of the parts mounting machines 10. Here, the automatic transporter 20 can be exemplified by an AGV (Automatic Guided Vehicle), which is an unmanned transport vehicle (unmanned transport robot) that automatically moves back and forth between an automated warehouse (not shown) and the parts mounting machines 10 to transport predetermined feeders 40. Although not shown, the automatic transporter 20 is equipped with an attachment / detachment mechanism (for example, a belt conveyor or an articulated robot) for attaching and detaching the feeders 40 to the parts supply device 12 of the parts mounting machine 10, which will be described later.
[0014] Furthermore, the production system 1 is equipped with a loader device 30 that replenishes parts P and changes the setup for the next production run in accordance with the production schedule. The loader device 30 is positioned in front of the parts mounting machine 10 (more specifically, the parts supply device 12, which will be described later) in the Y-axis direction and is movable in the X-axis direction. In this embodiment, the loader device 30 is also movable transversely in the X-axis direction relative to adjacent parts mounting machines 10 (parts supply devices 12).
[0015] Further, the loader device 30 moves the feeder 40 from the upper stage to the lower stage or from the lower stage to the upper stage in the slot 12S of the component supply device 12 described later. Further, the loader device 30 moves and exchanges the feeder 40 between the two component mounting machines 10, that is, between the slots 12S of the respective component supply devices 12. Specifically, the loader device 30 can temporarily accommodate (collect) the feeder 40 set in the upper stage of the slot 12S, move it in the X-axis direction, and then discharge and set the accommodated (collected) feeder 40 in the lower stage. Also, the loader device 30 can temporarily accommodate (collect) the feeder 40 set in the lower stage of the slot 12S, move it in the X-axis direction, and then discharge and set the accommodated (collected) feeder 40 in the upper stage.
[0016] Furthermore, the loader device 30 temporarily accommodates (collects) the feeder 40 set in the slot 12S of the component supply device 12 of one component mounting machine 10, moves it in the X-axis direction, and then discharges and sets the accommodated (collected) feeder 40 in the slot 12S of the component supply device 12 of the other component mounting machine 10, that is, it can exchange a plurality of feeders 40 between the component mounting machines 10. Thereby, the loader device 30 can automatically supply the component P and perform setup change (including the replacement of the feeder 40).
[0017] Here, in the production system 1, as shown in FIG. 1, in addition to the above-described devices 10, 20, 30, 40, a management device H for controlling the entire production is provided. Examples of the management device H include a host computer, a buffer, etc. that are communicably connected to the above-described devices. And, as will be described later, the management device H supplies various information including production information J related to production to the above-described devices 10, 20, 30, 40, 100 as necessary.
[0018] 2. Component mounting machine 10 As schematically shown in FIG. 3, the component mounting machine 10 mainly includes a substrate transfer device 11, a component supply device 12, a component transfer device 13, a component camera 14, a substrate camera 15, and a control device 16.
[0019] The substrate transport device 11 consists of a belt conveyor and the like, and sequentially transports the substrate K in the X-axis direction. The substrate transport device 11 positions the substrate K at a predetermined position inside the component mounting machine 10. Then, when the mounting work on the positioned substrate K is completed, the substrate transport device 11 transports the substrate K out of the component mounting machine 10 (for example, to an adjacent component mounting machine 10).
[0020] The component supply device 12 supplies components P (for example, electronic components) to be mounted on the substrate K. The component supply device 12 has a plurality of slots 12S arranged in the X-axis direction, and a feeder 40 is detachably set in each of the slots 12S. In this embodiment, the slots 12S are formed by an upper and lower section along the Z-axis direction (see Figure 2). The component supply device 12 uses the feeder 40 to feed and move a carrier tape 50 that supplies components P, which will be described later, and supplies components P to component supply positions Ps (see Figure 4) located at the tip side (upper side in Figure 3) of the feeder 40.
[0021] The component transfer device 13 holds the component P supplied to the component supply position Ps and mounts the held component P onto the positioned substrate K. The component transfer device 13 mainly comprises a head drive device 13A, a mobile table 13B, and a mounting head 13C. The head drive device 13A moves the mobile table 13B in the X-axis and Y-axis directions using a linear motion mechanism.
[0022] The mounting head 13C is a holding device for holding parts P and is detachably mounted on the movable table 13B. A nozzle holder 13D provided on the mounting head 13C is detachably equipped with a plurality of suction nozzles 13E capable of holding parts P. The suction nozzles 13E are supported on the mounting head 13C so as to be rotatable around an axis parallel to the Z-axis direction (the vertical direction of the parts mounting machine 10) and so as to be able to move up and down. The suction nozzles 13E hold parts P supplied to the parts supply position Ps by suction and mount the held parts P onto the positioned substrate K.
[0023] The component camera 14 and the substrate camera 15 are digital imaging devices having image sensors such as CCD or CMOS. The component camera 14 is fixed to the base of the component mounting machine 10 with its optical axis oriented in the Z-axis direction and images the component P held by the suction nozzle 13E from below. The substrate camera 15 is fixed to the mobile stage 13B with its optical axis oriented in the Z-axis direction and images the substrate K from above.
[0024] The control device 16 is a computer device whose main components are a CPU, ROM, RAM, and various interfaces, and it comprehensively controls the operation of the component mounting machine 10. Specifically, the control device 16 operates the component mounting machine 10 by executing a control program (not shown). As a result, the component mounting machine 10 performs the component mounting operation, for example, according to a pre-stored sequence.
[0025] For example, the control device 16 causes the substrate camera 15 to image the substrate K positioned by the substrate transport device 11. The control device 16 then processes the image captured by the substrate camera 15 to recognize the positioning state of the substrate K. In addition, the control device 16 has the suction nozzle 13E collect and hold the component P supplied by the component supply device 12, and then causes the component camera 14 to image the held component P. The control device 16 then processes the image captured by the component camera 14 to recognize the orientation of the component P.
[0026] The control device 16 executes a control program and moves the suction nozzle 13E (mounting head 13C) upwards towards a designated mounting position that is pre-set as the position for mounting the component P on the substrate K. The control device 16 also corrects the designated mounting position and mounting angle based on the positioning state of the substrate K and the orientation of the component P, and sets the actual mounting position and mounting angle for mounting the component P.
[0027] The control device 16 corrects the target position (X-axis coordinates and Y-axis coordinates) and rotation angle of the suction nozzle 13E according to the mounting position and mounting angle. Then, the control device 16 lowers the suction nozzle 13E at the corrected target position and the corrected rotation angle, and mounts the component P onto the substrate K. The control device 16 performs the mounting process of mounting multiple components P onto the substrate K by repeating the pick-and-place cycle as described above.
[0028] 3. Feeder 40 As shown in Figure 4, the feeder 40 comprises a feeder body 41, a drive sprocket 42, a tape presser 43, and a peeling 44. The feeder 40 holds a reel R on which a carrier tape 50 containing parts P for each part type is wound. The feeder 40 can communicate with the management device H, for example, when it is set in the slot 12S of the parts supply device 12 of the parts mounting machine 10, or when it is being transported by the automatic transport machine 20.
[0029] Here, we will explain the carrier tape 50 wound around the reel R. As shown in Figure 5, the carrier tape 50 comprises a base tape 51 and a cover tape 52. The base tape 51 is formed using a flexible material such as paper or resin. On one side of the base tape 51 in the width direction (the lower side in Figure 5), a plurality of cavities 511 capable of accommodating parts P are provided at equal intervals along the longitudinal direction of the base tape 51 (the left-right direction in Figure 5). On the other side of the base tape 51 in the width direction (the upper side in Figure 5), a plurality of feed holes 512 are provided at equal intervals along the longitudinal direction of the base tape 51. The plurality of feed holes 512 mesh with the drive sprocket 42.
[0030] The cover tape 52 is formed using a transparent polymer film or the like. As shown by the dashed line in Figure 5, the cover tape 52 covers the upper surface of the base tape 51 and prevents the component P housed in the cavity 511 from falling out. The base tape 51 and the cover tape 52 are joined to each other at joining portions 501 and 502, which are provided on both sides (one side and the other side) in the width direction of the carrier tape 50 that sandwiches the cavity 511. Here, joining portions 501 and 502 are provided on one side in the width direction of the carrier tape 50 from the feed hole 512.
[0031] Returning to the description of the feeder 40, the feeder body 41 is a thin, box-shaped component formed from a transparent or opaque resin plate or metal plate. The sides of the feeder body 41 are designed to be openable and closable (though not shown in the illustration), and inside the feeder body 41, as shown in Figure 4, a drive sprocket 42, a tape presser 43, and a peeling 44 are arranged.
[0032] The drive sprocket 42 is a sprocket that can mesh with the feed holes 512 provided in the base tape 51 of the carrier tape 50, and is rotatably mounted on the feeder body 41. A motor (e.g., a stepping motor) is connected to the drive sprocket 42 via a plurality of gears (not shown). As a result, the drive sprocket 42 is driven by the motor and feeds the carrier tape 50 by pitch, thereby transporting the parts P to the parts supply position Ps.
[0033] Here, the parts supply position Ps is located above the position where the drive sprocket 42 is positioned when viewed from the direction of the rotation axis of the drive sprocket 42 (X-axis direction). As a result, the feeder 40 can position the gear engagement position between the carrier tape 50 and the drive sprocket 42 close to the parts supply position Ps, thereby improving the positioning accuracy of the parts P transported to the parts supply position Ps.
[0034] The tape holding section 43 guides the carrier tape 50 pulled from the reel R so that the part P is transported to the part supply position Ps. The peeling section 44 peels the cover tape 52 from the base tape 51 before the part P reaches the part supply position Ps, making the part P housed in the cavity 511 ready to be picked up by the suction nozzle 13E (see Figure 3).
[0035] 4. Overview of the production support device 100 As described above, each component mounting machine 10 constituting the production system 1 mounts multiple components P of different types onto the substrate K, supplied from each of the multiple feeders 40 set in the multiple slots 12S of the component supply device 12 by an automatic transport machine 20 or loader device 30. That is, each component mounting machine 10 constituting the production system 1 performs the mounting process on the substrate K by sequentially picking and placing components P of different types, and then supplies the mounted substrate K to, for example, an adjacent component mounting machine 10.
[0036] Incidentally, in a production system 1 with multiple component mounting machines 10, the time required for pick-and-place may differ depending on the type of component P to be mounted on the substrate K. Therefore, in production system 1, there may be differences in cycle time, which represents the time required for each component mounting machine 10 to complete mounting of the component P onto the substrate K. If there is a large difference in cycle time in production system 1, the component mounting machine 10 with the longest cycle time may become a so-called bottleneck, potentially worsening the productivity when producing the substrate K.
[0037] In this context, to improve productivity, or in other words, to level out cycle times to prevent bottlenecks, the optimization of the arrangement of components P (component types) that each of the multiple component mounting machines 10 in the production system 1 attaches to the substrate K is usually considered, that is, the swapping of component types of components P in the production system 1. In other words, in order to optimize the mounting order of components P that the multiple component mounting machines 10 in the production system 1 sequentially attach to the substrate K, the swapping of component types, that is, the swapping of feeders 40 that supply component P for each component type, is considered.
[0038] However, when considering swapping parts (swapping feeders 40), a part type pair Kp representing parts from all the part types used in production system 1, or a feeder pair Kf representing feeders 40 that supply parts P of the part type to be swapped to the parts mounting machine 10, is tentatively determined. Then, for the tentatively determined part type pair Kp (or feeder pair Kf), the mounting process for parts P when the part type (feeder 40) is swapped is simulated and the cycle time is measured.
[0039] Typically, when considering the optimization of the layout, the following steps are performed: a provisional determination of such part type pairs Kp (or feeder pairs Kf), a simulation of the mounting process for the provisionally determined part type pairs Kp (or feeder pairs Kf), and an evaluation of the cycle time based on the simulation results, all combined for all part types (feeders 40) used in production system 1. Therefore, as the number of part types, i.e., feeders 40 or part mounting machines 10, increases, the content of the consideration regarding the optimization of the layout becomes more complex, and an enormous amount of time is required to evaluate the cycle time, that is, to obtain the optimal solution that realizes the optimization of the layout of part types (layout of feeders 40) that can eliminate bottlenecks.
[0040] In response to this concern, production system 1 is equipped with a production support device 100 that infers the aforementioned part type pair Kp. The production support device 100 is configured to communicate with each part mounting machine 10 (feeder 40), automatic transport machine 20, loader device 30, and management device H that constitute production system 1. The production support device 100 can also be, for example, a device incorporated into the management device H. The production support device 100 provides support to maximize evaluation with respect to a pre-set evaluation target. For example, cycle time can be used as an evaluation target. Depending on the evaluation result of the evaluation target, the production support device 100 infers and outputs a part type pair Kp that is to be replaced from among multiple part types of part P mounted by the part mounting machine 10.
[0041] Specifically, the production support device 100 stores a trained model generated by reinforcement learning. The production support device 100 then uses the trained model and production information J supplied from the management device H to infer and determine the pair of part types Kp to be replaced from among multiple part types of parts P used in production, specifically, the pair of feeders Kf to be replaced from among multiple feeders 40 set in the part mounting machine 10. This allows the production system 1 to optimize the arrangement (replacement) of part types, i.e., the arrangement (replacement) of the feeders 40. As a result, the cycle time of each part mounting machine 10 is leveled, and consequently, the impact of bottlenecks in the production system 1 on overall production can be reduced. Regarding bottlenecks, for example, if the leveling is above a certain standard, it can be considered that there are "no bottlenecks."
[0042] Incidentally, as mentioned above, when generating a trained model using reinforcement learning, ensuring that the training data used for machine learning is unbiased is crucial for efficiently generating a highly accurate trained model in a short training time. For example, if reinforcement learning is performed using only component type pairs Kp that have a shortened cycle time, i.e., those with good evaluation results, the inference accuracy of component type pairs Kp that cannot have their cycle time shortened, or component type pairs Kp that cannot be swapped at all, may deteriorate. In this case, it may also negatively affect the inference accuracy of component type pairs Kp that have good evaluation results.
[0043] Therefore, in this embodiment, the production support device 100 stores each of the first training data that has been classified according to a predetermined classification criterion. The production support device 100 then randomly extracts each of the classified and stored first training data L1 and performs first machine learning (reinforcement learning) using the extracted first training data L1. As a result, the production support device 100 can reduce (suppress) the bias of the first training data L1 and generate a highly accurate trained model in a short period of time, and consequently, accurately infer part type pairs Kp that have good evaluation results.
[0044] 4-1. Configuration of the production support device 100 Next, the configuration of the production support device 100 of this embodiment will be described. The production support device 100 is a device whose main components are a computer device having a CPU, ROM, RAM, and various interfaces, and as shown in Figure 6, it includes a first learning data acquisition unit 110, a learning data storage unit 120, an extraction unit 130, and a trained model storage unit 150. The production support device 100 also includes a production information acquisition unit 160 and an inference unit 170. Furthermore, the production support device 100 includes a trained model generation unit 140.
[0045] Furthermore, the production support device 100 includes an optimizer 180 capable of simulating cycle time, which is the evaluation result when arbitrarily selected (combined) part type pairs Kp are swapped, and simulating optimization using the inference results from the inference unit 170. In this embodiment, the case in which the production support device 100 includes the optimizer 180 is described as an example. However, the optimizer 180 only needs to be able to obtain inference results from arbitrarily selected part type pairs Kp and the inference unit 170, and can be installed in a device other than the production support device 100, for example, a management device H that can communicate with the production support device 100.
[0046] The first training data acquisition unit 110 acquires multiple first training data L1 used for first machine learning regarding component type pairs Kp, which are represented by component type pair data Cp of multiple components P to be mounted on the substrate K by each of the multiple component mounting machines 10, by swapping the component type pairs Kp and trying the mounting process, thereby improving the mounting process and obtaining a reward E. In the following description, the component type pair data Cp representing the component type pair Kp, the feeder pair data Cf representing the feeder pair Kf described later, and the component mounting machine pair data Cm representing the component mounting machine pair Km may be collectively referred to as "pair data C".
[0047] Here, the first learning data L1 acquired by the first learning data acquisition unit 110 may include optimization information D, which includes arrangement data Da relating to the arrangement of multiple component mounting machines 10 constituting the production system 1, and component type data Dk representing the component types of multiple components P that each component mounting machine 10 mounts to the substrate K. Furthermore, the optimization information D includes, for example, assumed cycle time data Ds representing the cycle time when the optimizer 180 simulates the mounting process performed by each component mounting machine 10 for any combination of component type pairs Kp (as described later) and inferred component type pair data Cpi representing the component type pairs Kp inferred by the inference unit 170. In addition, the optimization information D includes replacement restriction information Dj indicating the mounting order of components P that must be strictly observed, or whether components P can be replaced.
[0048] Furthermore, the placement data Da, component type data Dk, assumed cycle time data Ds, and replacement restriction information Dj included in the optimization information D are supplied by the management device H or an external device (not shown). In this embodiment, as shown in Figure 6, the case where the data is supplied by the management device H is illustrated.
[0049] Furthermore, the first learning data acquisition unit 110 acquires the pair data C output from the optimizer 180, specifically, the part type pair data Cp representing an arbitrarily combined part type pair Kp, and the cycle time data Rs, which is the result data obtained when simulating the mounting process when the part type pair Kp represented by the part type pair data Cp is swapped, as the first learning data L1. The optimizer 180 then links the part type pair data Cp and the cycle time data Rs obtained from the simulation and outputs them to the first learning data acquisition unit 110.
[0050] Here, the optimizer 180 performs a simulation of the mounting process for the part type pair data Cp of any part type pair Kp that can be determined based on the placement data Da and part type data Dk included in the optimization information D, and obtains the cycle time data Rs. In this case, the optimizer 180 also performs a simulation for part type pair Kp for which part P cannot be replaced, based on the replacement restriction information Dj included in the optimization information D. In this case, the resulting data output will be, for example, a value indicating that the cycle time data Rs cannot be replaced.
[0051] The first learning data acquisition unit 110 links the acquired optimization information D, pair data C (specifically, component type pair data Cp (or feeder pair data Cf)), and cycle time data Rs together. The first learning data acquisition unit 110 then outputs the linked optimization information D, pair data C (specifically, component type pair data Cp (or feeder pair data Cf)), and cycle time data Rs as the first learning data L1 to the learning data storage unit 120.
[0052] The learning data storage unit 120 stores a plurality of first learning data L1 acquired by the first learning data acquisition unit 110, classifying them according to predetermined classification criteria. In this embodiment, as described above, the result data includes cycle time data Rs. Also, as described above, the result data includes cases where the replacement of part type pair Kp is not possible. For this reason, in this embodiment, an example will be given of a case in which the classification criteria are adopted to classify the first learning data L1 according to whether or not the cycle time represented by the cycle time data Rs has been shortened relative to the assumed cycle time represented by the assumed cycle time data Ds. Also in this embodiment, an example will be given of a case in which the classification criteria are adopted to classify the first learning data L1 according to whether or not the replacement of part types is possible based on the replacement restriction information Dj.
[0053] As shown in Figure 6, the learning data storage unit 120 includes a first memory buffer 121, a second memory buffer 122, and a third memory buffer 123 as multiple (three in this embodiment) storage areas. The first memory buffer 121 sequentially stores first learning data L1 in which, for example, the cycle time represented by the cycle time data Rs is reduced to the assumed cycle time represented by the assumed cycle time data Ds by changing the type of component. That is, the first memory buffer 121 sequentially stores first learning data L1 that can be changed and that shortens the cycle time, in other words, first learning data L1 that appears infrequently.
[0054] The second memory buffer 122 sequentially stores and stores first learning data L1 where, for example, a component type change has been made and the cycle time represented by the cycle time data Rs is the same as or greater than the assumed cycle time. The third memory buffer 123 sequentially stores and stores first learning data L1 where, for example, a component type change is not possible based on the change restriction information Dj. In other words, the second memory buffer 122 and the third memory buffer 123 sequentially store and store first learning data L1 that appears frequently.
[0055] The extraction unit 130 randomly extracts the first training data L1 from the first memory buffer 121, second memory buffer 122, and third memory buffer 123, which have been classified and stored in the training data storage unit 120. In the following explanation, the first training data L1 randomly extracted by the extraction unit 130 from the first memory buffer 121 may be referred to as the first training data DE1. Furthermore, the first training data L1 randomly extracted by the extraction unit 130 from the second memory buffer 122 may be referred to as the first training data DE2. In addition, the first training data L1 randomly extracted by the extraction unit 130 from the third memory buffer 123 may be referred to as the first training data DE3.
[0056] Specifically, the extraction unit 130 randomly extracts the first learning data DE1, first learning data DE2, and first learning data DE3 from each of the first memory buffers 121, second memory buffer 122, and third memory buffer 123 of the learning data storage unit 120, provided that a certain number of the first learning data DE1, first learning data DE2, and first learning data DE3 are stored (accumulated). Furthermore, the extraction unit 130 randomly extracts the first learning data DE1, first learning data DE2, and first learning data DE3 stored in the learning data storage unit 120 in a composition ratio that can be arbitrarily set.
[0057] Here, the arbitrarily configurable composition ratio can be set, for example, according to the inference accuracy required for inference using the pre-trained model M generated as described later. For example, the extraction unit 130 randomly extracts the first training data DE1, first training data DE2, and first training data DE3 so that 40% is first training data DE1, 30% is first training data DE2, and 30% is first training data DE3. Alternatively, if the extraction unit 130 extracts the first training data DE1, first training data DE2, and first training data DE3 by a fixed operation (fixed pattern), the training data storage unit 120 stores the first training data L1 so that, for example, 40% is stored in the first memory buffer 121 and 30% each in the second memory buffer 122 and third memory buffer 123.
[0058] The trained model generation unit 140 generates a trained model M by repeatedly performing first machine learning using first training data DE1, first training data DE2, and first training data DE3, which are randomly selected by the extraction unit 130 to a predetermined composition ratio. Here, as described above, each of the multiple components P is housed in a carrier tape 50 wound around a reel R, and each reel R is loaded into a feeder 40 that supplies the components P housed in the carrier tape 50 to the component mounting machine 10.
[0059] Therefore, a component type pair Kp corresponds to a feeder pair Kf, which represents feeders 40 loaded with reels R around which carrier tapes 50 containing components P of the component type that form the component type pair Kp are wound. For this reason, the trained model generation unit 140 can generate a trained model M by, in addition to or instead of, machine learning (reinforcement learning) on the swapping patterns of component type pairs Kp, by swapping feeder pairs Kf and repeatedly performing machine learning (reinforcement learning) on swapping patterns in which multiple component mounting machines 10 mount components P, thereby obtaining a reward E as described later. Furthermore, as described later, the trained model generation unit 140 generates a value function, more specifically, an optimal action value function, as the trained model M.
[0060] The trained model storage unit 150 stores the trained model M generated by the trained model generation unit 140. Therefore, the trained model storage unit 150 can store the trained model M, which is updated as the trained model generation unit 140 repeatedly performs machine learning (reinforcement learning).
[0061] The production information acquisition unit 160 acquires production information J that includes at least new placement data Dan and new component type data Dkn, and instructs the production of a substrate K by mounting a new component P using the component mounting machine 10. Specifically, the production information acquisition unit 160 acquires production information J from the control device H when it is necessary to optimize a new component type pair Kp (or a new feeder pair Kf) during the production of the substrate K.
[0062] Here, the production information J output by the management device H includes the new number and new arrangement of component mounting machines 10 constituting the production system 1, corresponding to the placement data Da, the number of feeders 40 set in each component mounting machine 10, corresponding to the new type and new number of component P to be mounted in each component mounting machine 10, corresponding to the component type data Dk, and the cycle time as an actual or simulation result, corresponding to the assumed cycle time data Ds. Furthermore, the production information J includes control data including the specified mounting position and specified mounting angle of the component P on the substrate K, component information (shape, dimensions, maximum movement speed, imaging conditions, etc.), the degree of cycle time leveling (presence or absence of bottlenecks), and equipment information that affects the efficiency of the mounting process (mounting head 13C, suction nozzle 13E, etc.). Therefore, the trained model generation unit 140 can perform machine learning (reinforcement learning) using the production information J as the first training data L1.
[0063] The inference unit 170 uses the new placement data Dan and new part type data Dkn included in the production information J acquired by the production information acquisition unit 160, and the trained model M stored in the trained model storage unit 150, to output the inferred inference pair data Ci (specifically, the inferred part type pair data Cpi representing the inferred part type pair Kp or the inferred feeder pair data Cfi representing the inferred feeder pair Kf) that are to be replaced among the new part types distinguished by the part type data Dkn. Here, the inference unit 170 outputs the inferred pair data Ci (inferred part type pair data Cpi or inferred feeder pair data Cfi) to the management device H (more specifically, a display device, etc., provided on the management device H, which is not shown in the diagram) to guide workers, etc. The inference of part type pair Kp (or feeder pair Kf) by the inference unit 170 will be described in detail later.
[0064] 4-2. Configuration of the trained model generation unit 140 that functions in the learning phase Next, with reference to Figure 7, the configuration of the trained model generation unit 140 of the production support device 100 that functions in the learning phase will be described. As shown in Figure 7, the trained model generation unit 140 mainly comprises a state information acquisition unit 141, an evaluation result acquisition unit 142, a reward calculation unit 143, a value function storage unit 144, an action decision unit 145, an action information output unit 146, and a value function update unit 147.
[0065] In this embodiment, the state information acquisition unit 141 acquires one of the first training data DE1, first training data DE2, and first training data DE3, which are randomly selected by the extraction unit 130 in a predetermined ratio, as state information. That is, the state information acquisition unit 141 in this embodiment acquires one of the first training data DE1, first training data DE2, and first training data DE3, which are classified and randomly selected in a predetermined ratio, as state information. Here, in this embodiment, the state information acquisition unit 141 mainly acquires state information from the extraction unit 130, but can also acquire state information (first training data L1) from the optimizer 180.
[0066] The evaluation result acquisition unit 142 acquires evaluation results obtained from the mounting process after swapping component type pairs Kp, which are represented by component type pair data Cp included in one of the first learning data DE1, first learning data DE2, and first learning data DE3, or after swapping feeder pairs Kf, which are represented by feeder pair data Cf among the multiple feeders 40, with respect to a pre-set evaluation target. The evaluation result acquisition unit 142 acquires cycle time data Rs, whether the components P were mounted on the substrate K in ascending order, and whether the components P were mounted on the substrate K in ascending order of their height in the Z-axis direction from the surface of the substrate K, as well as other evaluation results. As shown in Figure 7, the evaluation result acquisition unit 142 can acquire evaluation results for the evaluation target from the optimizer 180.
[0067] The reward calculation unit 143 calculates a reward E for swapping a component type pair Kp (or feeder pair Kf) in one of the first learning data DE1, first learning data DE2, and first learning data DE3, based on the evaluation result of the swap of a component type pair Kp (or feeder pair Kf) represented by component type pair data Cp (or feeder pair Kf represented by feeder pair data Cf) obtained from the evaluation of the evaluation target (e.g., cycle time data Rs). The reward calculation unit 143 gives a positive reward E for swapping a component type pair Kp (or feeder pair Kf) if the evaluation result is good. On the other hand, the reward calculation unit 143 gives a negative reward (penalty) for swapping a component type pair Kp (or feeder pair Kf) if the evaluation result is not good.
[0068] For example, regarding the cycle time, which is one of the evaluation results, the reward calculation unit 143 gives a positive reward E if, when simulating the mounting process after swapping component type pairs Kp (or swapping feeder pairs Kf) (or when actually performing the mounting process on the component mounting machine 10), the cycle time represented by the cycle time data Rs decreases. On the other hand, the reward calculation unit 143 gives a negative reward E if the cycle time represented by the cycle time data Rs increases. Furthermore, regarding the order in which components P are placed on the substrate K, which is one of the evaluation results, the reward calculation unit 143 gives a positive reward E if, when simulating the mounting process after swapping component type pairs Kp (or swapping feeder pairs Kf) (or when actually performing the mounting process on the component mounting machine 10), the components P are placed (mounted) in order from smallest to largest, or in order from lowest to highest size. On the other hand, the reward calculation unit 143 gives a negative reward E if the components P are arranged (installed) in order from largest to smallest, or if the components P are arranged (installed) in order from most expensive to least expensive.
[0069] In this way, the reward calculation unit 143 calculates a reward E for each evaluation target. The reward calculation unit 143 also assigns a reward E according to the difference between the evaluation result and the standard set for each evaluation target. That is, the reward calculation unit 143 assigns a larger reward E when the difference between the evaluation result and the standard is large in the positive direction than when the difference is small in the positive direction. Conversely, when the difference between the evaluation result and the standard is large in the negative direction, it assigns a larger penalty than when the difference is small in the negative direction.
[0070] Let's take cycle time, one of the evaluation results, as an example to explain this in more detail. For example, before performing a simulation of swapping parts type pair Kp represented by part type pair data Cp (or swapping feeder pair Kf represented by feeder pair data Cf), the cycle time represented by the assumed cycle time data Ds included in the optimization information D is considered the assumed cycle time. Then, in a simulation of the mounting process after swapping parts type pair Kp (or feeder pair Kf), the reward calculation unit 143 gives a larger reward E if the reduction time, which is the difference between the cycle time represented by cycle time data Rs and the assumed cycle time, is large in the positive direction, compared to when the reduction time is small in the positive direction. In other words, the reward calculation unit 143 gives a larger reward E as the reduction time of the cycle time increases (as the cycle time is shortened). Conversely, if the reduction time is large in the negative direction, that is, if the cycle time is longer than the assumed cycle time, the reward calculation unit 143 gives a negative reward E or no reward E.
[0071] The value function storage unit 144 generates a value function in reinforcement learning, i.e., first machine learning, based on the state information acquired by the state information acquisition unit 141 (more specifically, pair data C included in one of the randomly selected first training data DE1, first training data DE2, and first training data DE3) and the reward E calculated by the reward calculation unit 143. Here, the value function is a function generated in the learning phase to obtain action information corresponding to the state information so that the evaluation result of the object being evaluated is optimized. The value function storage unit 144 then stores the generated value function, i.e., the trained model M, in an updatable manner. Therefore, the value function storage unit 144 also performs the function of the trained model storage unit 150.
[0072] In particular, the value function (trained model M) in this embodiment is an optimal action value function generated by DQN (Deep Q-Network) as a reinforcement learning algorithm. In this case, the optimal action value function is obtained as an approximation function using a neural network, and it gives the best action to take when the Q value (the value of the reward E obtained immediately according to the state) can be estimated for each action in a given state. That is, when the optimal action value function is the trained model M, the Q value is estimated using a neural network in which the output layer nodes are component type pair Kp (or feeder pair Kf (represented by feeder pair data Cf)) represented by component type pair data Cp, and as a result, the component type pair Kp (or feeder pair Kf (represented by feeder pair data Cf) to be replaced as the "best action" is given.
[0073] Furthermore, the value function is not limited to the case where the optimal action value function is found using DQN. For example, it is also possible to generate a value function using reinforcement learning algorithms such as Q-learning, Sarsa, or Monte Carlo methods. In this case, a "policy" is determined based on the generated value function, and the "best action" is determined based on the "policy."
[0074] The action decision unit 145 determines a component type pair Kp, which is selectable from among multiple component types, or a feeder pair Kf, which is selectable from among multiple feeders 40, based on state information (one of the first training data DE1, first training data DE2, and first training data DE3, randomly selected) and a trained model M (optimal action-value function). In this case, the action decision unit 145 can select a component type pair Kp (or feeder pair Kf) based on the optimal action-value function (trained model M), or, if necessary, search for a component type pair Kp (or feeder pair Kf) without relying on the optimal action-value function (trained model M). The action decision unit 145 then outputs a component type pair data Cp (or a feeder pair data Cf representing the determined feeder pair Kf), i.e., pair data C, which represents the determined component type pair Kp.
[0075] The action information output unit 146 outputs the decision made by the action decision unit 145, i.e., the component type pair Kp (or feeder pair Kf) to be replaced, as action information A to the optimizer 180. In this case, the optimizer 180 acquires the action information A and performs a simulation of the mounting process based on a hypothetical mounting condition in which the component type pair Kp (or feeder pair Kf) is replaced according to the action information A. Then, as a simulation result of the case in which the component type pair Kp (or feeder pair Kf) is replaced according to the action information A, the optimizer 180 estimates the cycle time, which is the evaluation result for the evaluation mode, and outputs the cycle time data Rs.
[0076] Subsequently, the state information acquisition unit 141 acquires the virtual mounting conditions as new optimization information D, i.e., new state information, and the evaluation result acquisition unit 142 acquires the estimated evaluation result of the evaluation target (e.g., cycle time data Rs) from the optimizer 180. Next, the reward calculation unit 143 calculates the reward E for the new optimization information D (i.e., behavior information A) based on the estimated evaluation result from the optimizer 180. In other words, the reward calculation unit 143 calculates the evaluation of the behavior information A, which has transitioned from the state information before the swap of the component type pair Kp (or feeder pair Kf) (e.g., the assumed cycle time data Ds of the optimization information D included in one of the first learning data DE1, DE2, DE3) to the new state information after the swap of the component type pair Kp (or feeder pair Kf) (e.g., cycle time data Rs), as the reward E for the new state information, i.e., the optimization information D.
[0077] The value function update unit 147 updates the optimal action value function stored in the value function update unit 147 based on the new state information updated based on the action information A, i.e., the optimized information D (specifically, cycle time data Rs), and the reward E for the new state information (optimized information D reflecting the action information A). The value function update unit 147 only needs to update the optimal action value function based on the reinforcement learning algorithm (DQN), and for example, if a negative reward E is given, it is possible not to update the optimal action value function.
[0078] 4-3. Configuration of the inference unit 170 that functions in the inference phase Next, with reference to Figure 8, the configuration of the inference unit 170 of the production support device 100 that functions in the inference phase will be described. As shown in Figure 8, the inference unit 170 mainly comprises a state information acquisition unit 171, a value function storage unit 172, an action decision unit 173, and an action information output unit 174. The state information acquisition unit 171, the value function storage unit 172, the action decision unit 173, and the action information output unit 174 have the same configuration as the state information acquisition unit 141, the value function storage unit 144, the action decision unit 145, and the action information output unit 146 of the trained model generation unit 140 described above.
[0079] 4-4. Optimization of the placement (rearrangement) of part type pairs Kp by the production support device 100 Next, referring to the flowchart of the optimization program shown in Figure 9, we will explain the optimization of the replacement of part type pairs Kp (or feeder pairs Kf) mainly performed by the inference unit 170 of the production support device 100. The optimization program starts in step S10. Then, in the following step S11, the production support device 100's production information acquisition unit 160 acquires production information J, for example, from the management device H, which instructs actual production. Then, as the "first step," the production support device 100 (inference unit 170) sets part mounting machine pair data Cm, which represents part mounting machine pair Km among the multiple part mounting machines 10 that constitute the production system 1, based on the production information J.
[0080] As described above, when optimization is performed, the multiple feeders 40 that can be set in each component mounting machine 10 are known, for example, by optimization information D and production information J. In other words, the component types of the components P to be mounted in each component mounting machine 10 are also known, for example, by optimization information D and production information J. That is, the relationship between each component type of component P and each component mounting machine 10 is also known. For this reason, when it is desired to optimize the component types, as a first step, for example, in accordance with the operator's instructions, the inference unit 170 appropriately sets component mounting machine pair data Cm, which represents a component mounting machine pair Km among the multiple component mounting machines 10 that constitute the production system 1, based on production information J.
[0081] In the following step S12, the production support device 100, as the "second step," infers a part type pair Kp (or feeder pair Kf) using the optimal action value function (trained model M). Specifically, as shown in Figure 8, the inference unit 170 obtains production information J as state information from the production information acquisition unit 160, which includes new placement data Dan and new part type data Dkn for the part type, via the state information acquisition unit 171. Then, the action decision unit 173 uses the state information (production information J) obtained by the state information acquisition unit 171 and the optimal action value function (trained model M) stored in the value function storage unit 172 (trained model storage unit 150) to infer the part type pair Kp (or feeder pair Kf (part mounting machine pair Km)) to be replaced. Here, the action decision unit 173 outputs the inferred part type pair data Cpi (or the inferred feeder pair data Cfi representing the inferred feeder pair Kf, or the inferred part attacher pair data Cmi representing the inferred part attacher pair Km), i.e., the inferred pair data Ci, to the action information output unit 174.
[0082] Returning to Figure 9, in step S13, the production support device 100 swaps the part type pair Kp (or feeder pair Kf) in the part mounting machine 10. That is, as shown in Figure 8, the action information output unit 174 of the production support device 100 outputs the part type pair Kp (or feeder pair Kf) inferred in step S11 as action information A to the management device H. The management device H then outputs a command based on the action information A, specifically a command to swap the feeders 40 that form the feeder pair Kf corresponding to the part type pair Kp, to, for example, multiple part mounting machines 10 and loader devices 30.
[0083] As a result, each component mounting machine 10 and loader device 30 swaps the two feeders 40 identified by the component type pair Kp, which is represented by the component type pair data Cp identified in the action information A, and specifically by the feeder pair Kf, which is represented by the feeder pair data Cf. The swapping of the feeders 40 in the component mounting machine 10 includes, for example, changing the identification number (a number corresponding to the order in which the component P is mounted) assigned to the slot 12S of the component supply device 12 in accordance with the swapping of the feeders 40.
[0084] Returning to Figure 9, in step S14, the production support device 100 acquires the cycle time required for the mounting process after the replacement of the part type, i.e., the feeder 40, in the part mounting machine 10. Specifically, the production support device 100 status information acquisition unit 171 acquires from the management device H the cycle time required for the mounting process in the part mounting machine 10 after the feeder 40 has been replaced.
[0085] In the following step S15, the production support device 100 determines whether the cycle time acquired in step S14 has improved compared to before the feeder 40 was replaced. That is, if the cycle time acquired in step S14 after the replacement of the part type pair Kp (or feeder pair Kf) is shorter than the estimated cycle time before the replacement of the part type pair Kp (or feeder pair Kf) included in the production information J (status information) acquired by the production information acquisition unit 160 in step S11, the production support device 100 determines "Yes" because the cycle time has improved. Then, the production support device 100 returns to step S12 and executes the processing of each step from step S12 onward.
[0086] On the other hand, if, for example, after performing multiple optimizations (replacements), the cycle time after replacing the part type pair Kp (or feeder pair Kf) acquired in step S12 is not shorter than the expected cycle time, the production support device 100 determines "No" because the cycle time has not been improved. Then, in step S16, the production support device 100 returns the part type that was replaced in step S13, i.e., the feeder 40, to its state before replacement and proceeds to step S17.
[0087] In other words, the action information output unit 174 of the production support device 100 outputs, for example, a feeder pair Kf (or part type pair Kp) that returns the corresponding feeder 40 to its previous state as action information A to the management device H. As a result, the management device H outputs a command based on the action information A, specifically a command to return the feeders 40 that form the feeder pair Kf corresponding to the part type pair Kp to their previous state, to, for example, multiple part mounting machines 10 and loader devices 30.
[0088] As a result, each component mounting machine 10 and loader device 30 returns the two feeders 40 identified by the component type pair Kp, specifically the feeder pair Kf, identified in the action information A, to their state before replacement. Note that the replacement of the feeders 40 in the component mounting machine 10 includes, for example, changing the identification number (a number corresponding to the order in which the component P is mounted) assigned to the slot 12S of the component supply device 12 in accordance with the replacement of the feeders 40.
[0089] In step S17, the production support device 100 determines, based on the production information J, whether the swapping of component types (feeders 40) described above, or in other words, the optimization study, has been completed for all target component mounting machine pair Km represented by the target component mounting machine pair data Cm for the multiple component mounting machines 10 constituting the production system 1. That is, if the optimization study for all target component mounting machine pair Km has not been completed, the production support device 100 determines "No" and returns to step S11. Then, if the production support device 100 sets a new component mounting machine pair Km in step S11, it executes the step processing from step S12 onwards as described above. On the other hand, if the optimization study for all target component mounting machine pair Km has been completed, the production support device 100 determines "Yes" and proceeds to step S18, and the execution of the optimization program ends in step S16.
[0090] Here, "all possible component mounting machine pairs Km" may include, for example, setting all combinations of component mounting machines 10 that constitute production system 1 as component mounting machine pairs Km. Alternatively, for example, if there are combinations of component mounting machines 10 that are expected to have an effect such as shortening cycle time, it is also possible to select the component mounting machines 10 that are expected to have such an effect from all of the component mounting machines 10 and set them as component mounting machine pairs Km.
[0091] As can be understood from the above explanation, the production support device 100 includes: a first learning data acquisition unit 110 that acquires multiple first learning data L1 used for first machine learning regarding part type pairs Kp, which are used to improve the mounting process and obtain a reward E by swapping part type pairs Kp of parts P to be mounted on a substrate K among multiple part mounting machines 10 and trying the mounting process; a learning data storage unit 120 that classifies and stores the acquired multiple first learning data L1 according to predetermined classification criteria; an extraction unit 130 that randomly extracts each of the first learning data L1 classified and stored in the first memory buffer 121, second memory buffer 122, and third memory buffer 123 of the learning data storage unit 120; and a trained model storage unit 150 that stores a trained model M generated by performing first machine learning using the randomly extracted first learning data L1. Furthermore, the production support device 100 includes a trained model generation unit 140 that generates a trained model M by repeatedly performing first machine learning using the randomly extracted first learning data L1.
[0092] Furthermore, the production support device 100 includes a production information acquisition unit 160 that acquires production information J which includes at least placement data Da (new placement data Dan) representing the placement of the component mounting machine 10 and component type data Dk (new component type data Dkn) representing the component type of component P, and instructs to produce a substrate K by mounting a new component P using the component mounting machine 10, and an inference unit 170 that uses the placement data Da (new placement data Dan) and component type data Dk (new component type data Dkn) of component P included in the production information J and a trained model M to infer and output a component type pair Kp (inferred component type pair data Cpi representing the inferred component type pair Kp) that is to be replaced among the new component types distinguished by the component type data Dk (new component type data Dkn).
[0093] Based on these methods, by randomly selecting each of the classified and stored first training data L1 and performing first machine learning, the bias in the first training data L1 can be suppressed, and the generated trained model M can be used. As a result, the production support device 100 can accurately infer and determine a pair of part types Kp (or feeder pair Kf) that can improve the mounting process among multiple part mounting machines 10 using the generated trained model M.
[0094] Therefore, by using the production support device 100, it is not necessary to sequentially examine all combinations of part types (feeders 40) for multiple part types to determine the part type pair Kp (feeder pair Kf) that is effective for optimization. Furthermore, by using the trained model M, it is possible to selectively determine the part type pair Kp (feeder pair Kf) that is effective for optimization for new part types as well, and the arrangement of part types (feeders 40) can be optimized efficiently.
[0095] 5. First variation In the embodiment described above, the first memory buffer 121, second memory buffer 122, and third memory buffer 123 of the learning data storage unit 120 store and accumulate first learning data L1 classified according to predetermined classification criteria. However, the first learning data DE1 stored in the first memory buffer 121, specifically the first learning data L1 which includes pair data C in which part type pairs Kp can be swapped and cycle time can be shortened, requires many trials using the optimizer 180 to find a combination.
[0096] In other words, the first training data L1, which includes pair data C in which the component type pair Kp can be swapped and the cycle time is shortened, will appear less frequently, as described above. Therefore, in order to reduce the bias of the first training data L1 and perform the first machine learning (reinforcement learning) when generating the trained model M, it is necessary to accumulate a predetermined number or more of the first training data DE1 in the first memory buffer 121, which may take time to proceed with training.
[0097] Therefore, in the first modified example, in addition to generating a trained model M by first machine learning using the first training data L1 described above, a second machine learning procedure is performed using the second training data L2, which includes the inference pair data Ci inferred by the inference unit 170, using the trained model M generated by the first machine learning procedure, to generate a trained model M. Here, the component type pair Kp (or feeder pair Kf) inferred using the trained model M can be swapped, the cycle time can be shortened, and the frequency of appearance increases over time.
[0098] Therefore, in the first modified example, the production support device 100 includes a second learning data acquisition unit 190, as shown by the long dashed line in Figure 6. The second learning data acquisition unit 190 acquires second learning data L2, which includes inference pair data Ci inferred by the inference unit 170, for example, inference part type pair data Cpi representing the inferred part type pair Kp. In addition to the inference part type pair data Cpi, the second learning data L2 also includes optimization information D and production information J acquired from the management device H.
[0099] In this first modified example, the trained model generation unit 140 can perform first machine learning (reinforcement learning) using the first training data DE1, first training data DE2, and first training data DE3, i.e., first training data L1, extracted by the extraction unit 130, and can also perform second machine learning (reinforcement learning) using the second training data L2 acquired by the second training data acquisition unit 190. In this case, the trained model generation unit 140 selects either the first machine learning or the second machine learning to generate a trained model M.
[0100] Specifically, the trained model generation unit 140 generates a trained model M by selecting and performing either the first machine learning or the second machine learning method according to the search rate which determines the search ratio, in accordance with the epsilon-greedy method. Therefore, as shown in Figure 7, if the first machine learning method is selected according to the epsilon-greedy method, the state information acquisition unit 141 of the first training data L1, i.e., one of the first training data DE1, first training data DE2, and first training data DE3, from the extraction unit 130 as state information. On the other hand, if the second machine learning method is selected according to the epsilon-greedy method, the state information acquisition unit 141 of the first modified example acquires the second training data L2 acquired by the second training data acquisition unit 190 as state information.
[0101] When the trained model generation unit 140 performs reinforcement learning using second machine learning, the state information acquisition unit 141 acquires the second training data L2 from the second training data acquisition unit 190. Then, the value function storage unit 144 generates a value function in reinforcement learning based on the state information (second training data L2, in particular the inference pair data Ci) acquired by the state information acquisition unit 141 and the reward E calculated by the reward calculation unit 143, as in the embodiment described above. That is, in the first modified example, the value function storage unit 144 performs reinforcement learning on the value function, i.e., the trained model M, generated using the pair data C included in the first training data DE1 in the embodiment described above, using the inference pair data Ci included in the second training data L2, and stores the trained model M generated by the reinforcement learning in an updatable manner.
[0102] Thus, in the first modified example, inference pair data Ci (inference component type pair data Cpi (or inference feeder pair data Cfi)) can be used. Therefore, in the first modified example, it appears that the frequency of reinforcement learning using the first training data DE1 described in the above embodiment can be increased, and the generation speed of the trained model M, in other words, the learning speed can be improved.
[0103] Furthermore, in the first modified example, the action decision unit 145 can determine a part type pair Kp of selectable part types from among multiple part types, or a feeder pair Kf of selectable feeders 40 from among multiple feeders 40, based on state information (second learning data L2) and a trained model M (optimal action value function), similar to the embodiment described above. In this case as well, the action decision unit 145 can select a part type pair Kp (or feeder pair Kf) based on the optimal action value function (trained model M), or, if necessary, search for a part type pair Kp (or feeder pair Kf) without relying on the optimal action value function (trained model M).
[0104] Furthermore, in the first modified example, the action information output unit 146 outputs the decision made by the action decision unit 145, i.e., the part type pair Kp (or feeder pair Kf) to be replaced, as action information A to the optimizer 180. The optimizer 180 then acquires the action information A, performs a simulation of the installation process based on virtual installation conditions in which the part type pair Kp (or feeder pair Kf) has been replaced according to the action information A, estimates the cycle time which is the evaluation result for the evaluation mode as a result of the simulation, and outputs the cycle time data Rs.
[0105] In the first modified example, the value function update unit 147 updates the optimal action value function stored in the value function update unit 147 based on the new state information updated based on the action information A, i.e., the optimized information D (specifically, the cycle time data Rs), and the reward E for the new state information (optimized information D reflecting the action information A). In the first modified example, the value function update unit 147 only needs to update the optimal action value function based on the reinforcement learning algorithm (DQN), and for example, if a negative reward E is given, it is possible not to update the optimal action value function.
[0106] Therefore, in the first modified example, depending on the situation, reinforcement learning can be advanced using the inferred pair data Ci, i.e., the inferred part type pair data Cpi (or inferred feeder pair data Cfi)), which is inferred using the trained model M, as the second training data L2, particularly for part type pair Kp (or feeder pair Kf) for which the cycle time represented by the cycle time data Rs can be shortened.
[0107] In other words, in the first modified example, in addition to the first training data DE1 which takes time to accumulate in the first memory buffer 121 in the training data storage unit 120, inference pair data Ci (inference part type pair data Cpi or inference feeder pair data Cfi) corresponding to the pair data C included in the first training data DE1 inferred by the trained model M can be used in the second machine learning. Here, by performing inference using the trained model M, the frequency of appearance of inference pair data Ci (inference part type pair data Cpi or inference feeder pair data Cfi), in other words, interchangeable part type pairs Kp (or feeder pairs Kf) that shorten the cycle time, increases.
[0108] As a result, in the first modified example, the time required to accumulate the first training data L1 and the second training data L2, which are classified according to predetermined classification criteria and stored in the first memory buffer 121, until they reach a predetermined number or more can be shortened. Consequently, in the first modified example, the training time required to generate a trained model M with high inference accuracy can be shortened, and the trained model M can be generated efficiently. Other effects are the same as those obtained in the embodiments described above.
[0109] 6. Second variation In the above-described embodiment, for example, as a first step, a parts mounting machine pair Km is set by an operator, and as a second step, the production support device 100 can replace the feeders 40 (part types) set in the selectively set parts mounting machine pair Km, i.e., make them the target of optimization. As a result, in the above-described embodiment and the first modified example, for example, the number of simulations performed by the optimizer 180 can be reduced, and the arrangement of parts types can be optimized efficiently.
[0110] Incidentally, as mentioned above, the part type of the part P to be mounted in each part mounting machine 10 is known. Therefore, instead of inferring a part type pair Kp or feeder pair Kf as the target for replacement, as in the embodiment described above, the production support device 100 can also infer a part mounting machine pair Km represented by part mounting machine pair data Cm in the same way as inferring a part type pair Kp represented by part type pair data Cp or a feeder pair Kf represented by feeder pair data Cf, as shown in Figures 6, 7, and 9. That is, in this case, as a first step, for example, in step S11 of the optimization program described above, a part mounting machine pair Km (part mounting machine pair data Cm) is inferred for a part mounting machine 10 that is highly likely to mount the feeder 40 (part type) to be replaced, based on the learned model M and the optimization information D (or production information J). As a result, as described above, the production support device 100 can infer the feeder pair Kf, i.e., the part type pair Kp, that will actually be set in the part mounting machine pair Km in the second process, thereby enabling efficient optimization of the part type arrangement.
[0111] 7. Third variation Furthermore, in the above-described embodiment and the first modified example, during the learning phase, the reward calculation unit 143 calculates a reward E according to the evaluation result, regardless of the evaluation target. In addition, as shown by the dashed line in Figure 7, the trained model generation unit 140 may also include a weighting unit 148. The weighting unit 148 will be described below.
[0112] The weighting unit 148 weights the reward E that the reward calculation unit 143 gives to each of the multiple evaluation targets. In other words, if the importance of some of the multiple evaluation targets (e.g., cycle time) is higher than the importance of other evaluation targets (e.g., placement of parts P), the weighting unit 148 increases the reward E or the degree of penalty given to some of the evaluation targets compared to others. Therefore, the same effects as the above-described embodiment and the first modification can be obtained in the third modification as well. The weighting of the reward E for each evaluation target can be set, for example, by the worker.
[0113] 8. Other variations In the embodiments and first modified examples described above, the production support device 100 infers part type pairs Kp (feeder pairs Kf) based on the learned model M and optimization information D (production information J). Alternatively, for example, as in the second modified example, if the production support device 100 infers part mounting machine pairs Km, the operator may determine the part type pairs Kp and feeder pairs Kf for a limited number of feeders 40, i.e., part types, that are set in the part mounting machine 10 that form the part mounting machine pairs Km. Even in this case, since the number of part types (feeders 40) that are subject to replacement is limited, even if the operator determines the part type pairs Kp and feeder pairs Kf, it is possible to optimize the placement of part types more efficiently than with the conventional method described above.
[0114] Furthermore, in the embodiments and modifications described above, the production support device 100 is equipped with a trained model generation unit 140. Alternatively, the trained model generation unit 140 can be installed in a device other than the production support device 100 installed in the production system 1 (for example, the management device H of the production system 1, or a computer device owned by the manufacturer that manufactures the production system 1 and the component mounting machine 10 and is capable of communicating with the management device H). In this case, the trained model generation unit 140 installed in a device other than the production support device 100 can generate a trained model M using, for example, optimization information D owned by the manufacturer. The generated trained model M is then supplied to, for example, the management device H of the production system 1, and from the management device H it is supplied to the trained model storage unit 150 of the production support device 100 for storage. In this case as well, the same effects as in the embodiments and modifications described above can be obtained. [Explanation of symbols]
[0115] 1...Production system, 10...Component mounting machine, 11...Substrate transport device, 12...Component supply device, 12S...Slot, 13...Component transfer device, 13A...Head drive device, 13B...Mobile platform, 13C...Mounting head, 13D...Nozzle holder, 13E...Suction nozzle, 14...Component camera, 15...Substrate camera, 16...Control device, 20...Automatic transport machine, 30...Loader device, 40...Feeder, 41...Feeder body, 42...Drive sprocket, 43...Tape presser, 44... Peeling section, 50... Carrier tape, 501... Joining section, 502... Joining section, 51... Base tape, 511... Cavity, 512... Feed hole, 52... Cover tape, 100... Production support device, 110... First training data acquisition unit, 120... Training data storage unit, 130... Extraction unit, 140... Trained model generation unit, 141... State information acquisition unit, 142... Evaluation result acquisition unit, 143... Reward calculation unit, 144... Value function storage unit, 145... Action decision unit ,146...Action information output unit, 147...Value function update unit, 148...Weighting unit, 150...Trained model storage unit, 160...Production information acquisition unit, 170...Inference unit, 171...State information acquisition unit, 172...Value function storage unit, 173...Action decision unit, 174...Action information output unit, 180...Optimizer, 190...Second training data acquisition unit, P...Part, Ps...Part supply position, R...Reel, D...Optimization information, Da...Placement data, Dk...Part type data Data, Ds...Expected cycle time data, Dj...Replacement restriction information, C...Paired data, Cp...Part type paired data, Cf...Feeder paired data, Cm...Parts attacher paired data, Ci...Inference paired data, Cpi...Inference part type paired data, Cfi...Inference feeder paired data, Cmi...Inference parts attacher paired data, Rs...Cycle time data (result data), J...Production information, M...Trained model, E...Reward, A...Action information, H...Control device
Claims
1. A first training data acquisition unit acquires multiple first training data used for first machine learning regarding the component type pairs from which the mounting process is improved and rewards are obtained by swapping component type pairs of components to be mounted on a circuit board between multiple component mounting machines and trying the mounting process, A learning data storage unit that classifies and stores multiple acquired first learning data according to predetermined classification criteria, An extraction unit randomly extracts each of the first learning data classified and stored in the learning data storage unit, A trained model storage unit that stores a trained model generated by performing the first machine learning using the first training data randomly selected, A production support device equipped with these features.
2. A production information acquisition unit that acquires production information instructing to produce the circuit board by attaching new components using the component attachment machine, which includes at least arrangement data representing the arrangement of the component attachment machine and component type data representing the component type of the component. An inference unit that uses the arrangement data and part type data of the parts included in the production information and the trained model to infer and output the part type pairs to be replaced from among the new part types distinguished by the part type data, A production support apparatus according to claim 1, comprising:
3. The production support apparatus according to claim 1 or 2, wherein the extraction unit randomly extracts the first learning data from each of the first learning data that are classified and stored in the learning data storage unit in a composition ratio that can be arbitrarily set.
4. The production support apparatus according to claim 1 or 2, wherein the extraction unit randomly extracts the first learning data from among the first learning data stored in a certain number or more in the learning data storage unit.
5. The production support apparatus according to claim 1 or 2, further comprising a trained model generation unit that generates the trained model by repeatedly performing the first machine learning using the first training data randomly selected.
6. The production support apparatus according to claim 5, wherein the first learning data is randomly extracted when a certain number or more of each of the first learning data are stored and accumulated in the learning data storage unit.
7. The first training data mentioned above is: The production support device according to claim 1 or 2, wherein arrangement data representing the arrangement of the parts mounting machines, part type data representing the part type of the parts, part type pair data representing the part type pairs of the part types distinguished by the part type data, and result data representing the results obtained when a plurality of parts mounting machines swap the part types and mount the parts, are linked together.
8. The aforementioned learning data storage unit is: The production support device according to claim 7, which classifies and stores the acquired plurality of first learning data according to a predetermined classification criterion relating to the result data.
9. The aforementioned result data includes the cycle time required for mounting the component. The production support apparatus according to claim 8, wherein the classification criterion is a criterion for classifying the first learning data according to the cycle time.
10. The aforementioned result data includes cases where the parts type pair cannot be swapped, The production support apparatus according to claim 8, wherein the classification criterion is a criterion for classifying the first learning data according to whether or not the parts type pairs can be swapped.
11. Furthermore, it has a second learning data acquisition unit that acquires second learning data including inferred part type pair data representing the part type pair inferred by the inference unit, The aforementioned trained model storage unit is The production support device according to claim 2, which stores the trained model generated by performing either the first machine learning using the first training data, or the second machine learning on the part type pairs inferred using the second training data, in which the mounting process is improved and a reward is obtained by trying the mounting process with the swapped part type pairs.
12. The production support apparatus according to claim 11, further comprising a trained model generation unit that generates the trained model by repeatedly performing either the first machine learning or the second machine learning.
13. The aforementioned trained model generation unit, The production support apparatus according to claim 12, which selects and performs one of the first machine learning and the second machine learning according to the search rate that determines the search ratio, in accordance with the epsilon-greedy method, in order to generate the trained model.
14. The aforementioned compensation is A production support device according to claim 1 or 2, provided in a simulation after swapping the aforementioned parts pairs, in which the cycle time required for mounting the parts is shortened.
15. The aforementioned compensation is The production support device according to claim 14, wherein the reduction in cycle time after the replacement of the parts type pair increases as the reduction in cycle time after the replacement of the parts type pair increases compared to the cycle time before the replacement of the parts type pair.
16. The aforementioned compensation is The production support apparatus according to claim 1 or 2, provided that in a simulation after swapping the aforementioned pairs of component types, the components are mounted on the substrate in order from smallest to largest.
17. The aforementioned compensation is The production support apparatus according to claim 16, provided that in a simulation after swapping the aforementioned pairs of component types, the components are mounted on the substrate in order from lowest to highest height from the surface of the substrate.
18. Each of the aforementioned components is housed in a carrier tape wound around a reel. The production support apparatus according to claim 2, wherein each reel is loaded into a feeder that supplies the components contained in the carrier tape to the component mounting machine.
19. The inference unit, The production support device according to claim 18, which infers and outputs a feeder pair representing the feeders that supply the aforementioned components to the component mounting machine as the component type pair.
20. A first step is to set up component mounting machine pairs that represent the component mounting machines among a plurality of component mounting machines based on the aforementioned arrangement data, The production support apparatus according to claim 2, which performs a second step of inferring and outputting the part type pair in the part mounting machine pair.
21. The inference unit, The production support device according to claim 20, which uses the production information and the trained model to infer and output the pair of component mounting machines in the first process.