Automated Defect Detection Based on Machine Vision
The automated defect detection system based on machine vision and neural networks solves the problems of unreliability and low efficiency of human visual inspection, and achieves efficient and accurate defect detection of manufactured parts, supporting real-time defect identification and processing on the assembly line.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TIDI KAIYANGAN INTELLIGENT TECHNOLOGY PTE LTD
- Filing Date
- 2020-12-02
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, visual inspection of manufactured parts is unreliable and inefficient, and it is difficult to cope with the diversity and uncertainty of defect location.
An automated defect detection system based on machine vision is adopted. It utilizes neural network training and preprocessing techniques to generate defect scores through high-resolution imaging, image patch extraction, and computer vision model analysis, thereby replacing human inspection.
It improves the accuracy and efficiency of defect detection, reduces inspection time, and enables the real-time identification and handling of defective parts on the assembly line, thereby reducing production costs.
Smart Images

Figure CN114902279B_ABST
Abstract
Description
[0001] Cross-reference to related applications
[0002] This application claims the benefit of U.S. Provisional Application No. 62 / 950,440, filed December 19, 2019, entitled “AUTOMATED MACHINE VISION-BASEDDEFECT DETECTION,” which is incorporated herein by reference in its entirety for all purposes. Technical Field
[0003] This disclosure generally relates to the inspection of manufactured parts, and more specifically to automated defect detection based on machine vision.
[0004] background
[0005] Defect identification is a crucial component of many manufacturing processes. Existing quality inspection systems include visual confirmation to ensure parts are in the correct location, have the correct shape, color, or texture, and are free of imperfections such as scratches, pinholes, and contaminant particles. However, human visual inspection can be unreliable due to the limitations of human vision and human error. Furthermore, the volume of inspections, product diversity, and the possibility that defects can appear anywhere on a product and at any size place a heavy burden on inspectors. Therefore, effective systems and methods are needed to replace human visual inspection of machine-manufactured parts.
[0006] Overview
[0007] The following is a brief overview of this disclosure to provide a basic understanding of specific embodiments thereof. This overview is not a comprehensive summary of this disclosure, nor does it identify essential / critical elements of this disclosure or describe its scope. Its sole purpose is to present some of the concepts disclosed herein in a simplified form as an introduction to the more detailed description that follows.
[0008] Generally, specific embodiments of this disclosure describe systems and methods for automated defect detection based on machine vision. The method includes operation in a training mode and an inference mode. The method includes training a neural network to detect defects. Training the neural network includes receiving a plurality of historical datasets comprising a plurality of training images corresponding to one or more known defects, converting each training image into a corresponding matrix representation, and inputting each corresponding matrix representation into the neural network to adjust weighting parameters based on one or more known defects. The weighting parameters correspond to the dimensions of the matrix representations. The method also includes obtaining a test image of the object. The test image is not part of the historical datasets.
[0009] The method also includes extracting portions of the test image as multiple input image patches into a neural network, each patch corresponding to an extracted portion of the test image. The method further includes inputting each input image patch as a corresponding matrix representation into the neural network to automatically generate a probability score for each input image patch using weighted parameters. The probability score for each input image patch indicates the probability that the input image patch contains a predicted defect, and a defect score for the test image is generated based on the probability scores of each input image patch. The defect score indicates the condition of the object.
[0010] Input image patches can have a uniform height and width. Input image patches can include overlapping portions of the test image. Input image patches can be aligned such that each input image patch is adjacent to one or more other input image patches.
[0011] The neural network can include one or more of the following: convolutional layers, max-pooling layers, flattening layers, and fully connected layers. The neural network can be trained to accurately output a probability score of an input image patch having unknown defects using weighted parameters. The method can also include generating a heat map of the input image patch based on the probability score. Before passing the test image into the neural network, the test image can be preprocessed to remove the background and represent the image only in luminance components in YCbCr format.
[0012] Other implementations of this disclosure include corresponding devices, systems, and computer programs configured to perform the described methods. These other implementations may optionally include one or more of the following features. For example, a server system is provided that includes an interface configured to receive a plurality of historical datasets and test images of objects, the plurality of historical datasets including multiple images corresponding to one or more levels of known defects. The test images are not part of the historical datasets. The system also includes a memory configured to store the historical datasets and the test images.
[0013] The system also includes a processor associated with the neural network. This processor is configured to train the neural network to detect defects. Training the neural network involves converting each training image into a corresponding matrix representation and inputting each corresponding matrix representation into the neural network to adjust weighting parameters based on one or more known defects. The weighting parameters correspond to the dimensions of the matrix representation.
[0014] The processor is also configured to extract portions of the test image as multiple input image patches for input into the neural network, each input image patch corresponding to an extracted portion of the test image. The processor is further configured to input each input image patch as a corresponding matrix representation into the neural network to automatically generate a probability score for each input image patch using weighted parameters. The probability score for each input image patch indicates the probability that the input image patch contains a predicted defect, and a defect score for the test image is generated based on the probability scores of each input image patch. The defect score indicates the condition of the object.
[0015] One or more non-transitory computer-readable media are also provided, on which one or more programs are stored for execution by a computer to perform the described methods and systems. These and other embodiments are further described below with reference to the accompanying drawings. Brief description of the attached diagram
[0017] This disclosure can be best understood by referring to the following description taken in conjunction with the accompanying drawings, which illustrate specific embodiments of this disclosure.
[0018] Figure 1A An illustration shows an example network architecture for implementing various systems and methods of this disclosure according to one or more embodiments.
[0019] Figure 1B An example imaging and processing system for automatically inspecting manufactured parts is shown according to one or more embodiments.
[0020] Figure 2 A process flow diagram for automated defect detection based on machine vision is shown according to one or more embodiments.
[0021] Figure 3A and Figure 3B Images captured for component inspection according to one or more embodiments are shown.
[0022] Figure 4A and Figure 4B Example output images generated by automated inspection according to one or more embodiments are shown.
[0023] Figure 5 An example user interface for displaying processed and inspected images is shown according to one or more embodiments.
[0024] Figure 6 An example neural network architecture for automatically detecting defects is shown, implemented according to one or more embodiments.
[0025] Figure 7A , Figure 7B and Figure 7CAn example method for automated defect detection based on machine vision is shown according to one or more embodiments.
[0026] Figure 8 Specific examples of computer systems that can be used with various embodiments of this disclosure are shown.
[0027] Description of specific embodiments
[0028] Reference will now be made in detail to some specific examples of this disclosure. Examples of these specific embodiments are illustrated in the accompanying drawings. Although this disclosure has been described in conjunction with these specific embodiments, it will be understood that it is not intended to limit this disclosure to the described embodiments. Rather, it is intended to cover alternatives, modifications, and equivalents that may be included within the spirit and scope of this disclosure as defined by the appended claims.
[0029] In the following description, numerous specific details are set forth in order to provide a thorough understanding of this disclosure. Specific exemplary embodiments of this disclosure may be implemented without some or all of these specific details. In other instances, well-known process operations have not been described in detail so as not to unnecessarily obscure this disclosure.
[0030] For clarity, various techniques and mechanisms of this disclosure will sometimes be described in the singular. However, it should be noted that, unless otherwise stated, some embodiments include multiple iterations of the technique or multiple instances of the mechanism. Furthermore, the techniques and mechanisms of this disclosure will sometimes describe connections between two entities. It should be noted that a connection between two entities does not necessarily mean a direct, unobstructed connection, as various other entities may exist between the two entities. Therefore, unless otherwise stated, a connection does not necessarily mean a direct, unobstructed connection.
[0031] Overview
[0032] The general purpose of this disclosure (which will be described in more detail below) is to provide a system and method for automating computer vision solutions to replace human visual inspection of machine-manufactured parts. Human visual inspection of parts typically takes 30 seconds to 1 minute and always includes the possibility of human error. The described system and associated method can significantly reduce inspection time and provide improved accuracy in identifying defective parts.
[0033] The described system includes a light source and a high-resolution imaging device for capturing high-resolution images of mechanically manufactured parts. The images are processed to remove background and other noise, align the images, and perform other image enhancements. Finally, the images are segmented into input image patches for analysis using a computer vision-based model or neural network.
[0034] The neural network may include various computational layers, including at least a series of convolutional and max-pooling layers, planarization layers, and one or more fully connected layers. The neural network is trained to accurately output a probability score for each input image patch, corresponding to the likelihood that the input image patch contains a defective image. This defect could be a scratch, dent, or any other condition that does not meet part quality standards.
[0035] An overall defect score can then be generated for the entire image of the part based on the probability score of each input image patch. If the overall defect score is below a predetermined threshold, the part corresponding to the image can be classified as satisfactory. However, if the overall defect score is above the predetermined threshold, the part can be classified as defective. Defective parts can be removed from the assembly line. In some embodiments, defective parts can be discarded or repaired to meet quality standards.
[0036] Various output images can be generated and displayed on the user interface. For example, heatmaps can be generated to indicate the probability score of each input image patch. As another example, the outline of a region with detected defects can be overlaid on a captured image to locate the defects.
[0037] Compared to human visual inspection, this imaging technology can provide more accurate and precise part analysis. Surface features can be enhanced for visualization through image preprocessing. The described technique also allows for faster browsing of more parts within a given time without compromising inspection quality.
[0038] The defect detection process can be implemented at different points along the assembly line to reduce production costs or identify faulty components along the line. For example, to avoid unnecessary production costs, defective parts can be identified and discarded by the described system before further processing or handling. As another example, if a high percentage of similar defects are found after a specific point in the assembly line, the described technology can identify and determine problems related to the handling or manufacturing of the components.
[0039] Other objectives and advantages of this apparatus, system, and method will become apparent to the reader, and are intended to be within the scope of this invention.
[0040] To achieve the above and related objectives, the disclosed apparatus, systems and methods may be implemented in the form shown in the accompanying drawings; however, it should be noted that the drawings are merely illustrative and may be modified in the specific construction shown.
[0041] Detailed Implementation Examples
[0042] Turning now descriptively to the accompanying figures, in which similar reference numerals denote similar elements in multiple views, the figures illustrate systems and methods for automated defect detection based on machine vision.
[0043] According to various embodiments of this disclosure Figure 1A Illustrations are shown of an example network architecture 100 for implementing various systems and methods of this disclosure according to one or more embodiments. Network architecture 100 includes a plurality of client devices (or “user equipment”) 102-108 capable of communicatively connecting to one or more server systems 112 and 114 via network 110. In some implementations, network 110 may be a public communications network (e.g., the Internet, a cellular data network, a dial-up modem on a telephone network) or a private communications network (e.g., a private LAN, a leased line).
[0044] In some embodiments, server systems 112 and 114 include one or more processors and memory. The processors of server systems 112 and 114 execute computer instructions (e.g., network computer program code) stored in memory to receive and process data received from various client devices. In some embodiments, server system 112 is a content server configured to receive and store historical datasets, parameters, and other training information for neural networks. In some embodiments, server system 114 is a dispatch server configured to transmit and / or route network data packets including network messages. In some embodiments, content server 112 and dispatch server 114 are configured as a single server system that performs the operations of two servers.
[0045] In some embodiments, network architecture 100 may further include a database 116 that is communicatively connected to client devices 102-108 and server systems 112 and 114 via network 110. In some embodiments, network data or other information (such as computer instructions, historical datasets, parameters, and other training information of neural networks) may be stored in and / or retrieved from database 116.
[0046] User access to server system 112 via client devices 102-108 to participate in network data exchange services. For example, client devices 102-108 may execute a web browser application that can be used to access network data exchange services. In another example, client devices 102-108 may execute a network-specific software application (e.g., a networked data exchange "app" running on a device such as a computer or smartphone).
[0047] Users interacting with client devices 102-108 can participate in the network data exchange service provided by server system 112 by distributing and retrieving digital content, such as text annotations (e.g., updates, announcements, replies), digital images, videos, online orders, payment information, event updates, location information, computer code and software, or other suitable electronic information. In some embodiments, network architecture 100 may be a distributed, open information technology (IT) architecture configured for edge computing.
[0048] In some implementations, client devices 102-108 may be computing devices, such as laptops or desktop computers, smartphones, personal digital assistants, portable media players, tablets, cameras, or other suitable computing devices that can be used to communicate over a network. In some implementations, server systems 112 or 114 may include one or more computing devices, such as computer servers. In some implementations, server systems 112 or 114 may represent more than one computing device working together to perform the actions of a server computer (e.g., cloud computing). In some implementations, network 110 may be a public communications network (e.g., the Internet, cellular data networks, dial-up modems on telephone networks) or a private communications network (e.g., a private LAN, a leased line).
[0049] In various embodiments, the client device and / or server can be implemented as an imaging and image processing system. Figure 1B An example imaging and processing system 150 for automated inspection of manufactured parts, according to one or more embodiments, is shown. In various embodiments, system 150 includes a platform 152 having one or more light sources 160 positioned around the platform. An object 310 may be placed on the surface of the platform. In some embodiments, the platform may be configured to hold the object 310 in a desired location or orientation. The object holding mechanism may include fasteners, clamps, vacuum-based retainers, etc. Although four light sources 160 are shown positioned at a corner of platform 152, various embodiments may include more or fewer light sources positioned in various other locations to provide desired illumination of the object 310. In some embodiments, the positions of the light sources 160 may be configured to change to a desired position during operation to provide desired illumination on the object. Any suitable motion mechanism (e.g., a motor, etc.) for positioning the light sources can be implemented.
[0050] System 150 may also include a camera 154. In various embodiments, camera 154 is a high-resolution camera configured to capture high-resolution still images of objects on the platform. The captured images can then be transmitted to processing device 156, which can apply image processing algorithms and implement the computer vision-based model described herein to automatically detect defects on the object. As used herein, the computer vision-based model may include a neural network.
[0051] In various embodiments, processing device 156 may be an edge computing device configured to locally process images captured from camera 154 using the computer vision model described herein. In some embodiments, processing device 156 is an embedded device in a client device (e.g., camera 154) that performs the image processing described herein. In some embodiments, the embedded device is a microcontroller unit (MCU) or other embedded processor or chip. In some embodiments, client devices 102-108 may be used as processing device 156 to perform image processing. In some embodiments, processing device 156 may be servers 112 and / or 114, implemented as a local computer or a server on a dedicated LAN to process the captured images. In some embodiments, servers 112 and / or 114 may be implemented as a centralized data center to provide updates and parameters for the neural network implemented by the processing device. Such edge computing configurations allow for efficient data processing because large amounts of data can be processed near the source, thereby reducing Internet bandwidth usage. This reduces costs and ensures that applications can be used effectively in remote locations. Furthermore, the ability to process data without placing it in a public cloud adds a useful layer of security for sensitive data.
[0052] Figure 2 A process flow diagram for automated defect detection based on machine vision is shown according to one or more embodiments. In operation 202, an object is obtained for imaging. In a particular embodiment, object 310 is a machine-manufactured part. For example, object 310 may be a trim piece for an automobile, such as a molded trim.
[0053] In operation 204, the object is positioned in the desired orientation. For example, a component may be positioned and secured on platform 152. In some embodiments, such components may be processed through various automated processes and placed directly on the platform. In some embodiments, the platform may be integrated into an assembly line, allowing components to be inspected at different times during the manufacturing process. For example, an automotive trim component may have one (or more) scratches, which does not meet predetermined quality standards. These defective components may then be discarded or further processed to resolve the defects. Components without any indication of scratches or defects are acceptable and can be further processed according to quality standards.
[0054] Once positioned on the platform in the desired orientation, the object is exposed to sufficient lighting and a still image is captured by camera 154. In operation 206, camera 154 can obtain a high-resolution image of the object. For example, the captured image may include approximately 8 megabytes, or a resolution higher than approximately 1800 × 1200 pixels, or an effective resolution higher than approximately 300 pixels per inch. Reference Figure 3A The figure shows a high-resolution image 300 of object 310. As shown, image 300 includes object 310, background 312, and shadow 314.
[0055] In operation 208, the high-resolution image is preprocessed to prepare it for input into the described neural network. In some embodiments, the image may be preprocessed to sharpen it, thereby enhancing the fine details of the imaged object. In some embodiments, other preprocessing stages may include automatic object alignment, background removal, color removal, contrast enhancement, and other image quality enhancements.
[0056] refer to Figure 3B An example of a preprocessed or enhanced image 320 of an object 310 according to one or more embodiments is shown. Image 320 has been preprocessed to remove background and improve contrast. Furthermore, image 320 is represented only by a single channel, specifically by the Y component in YCbCr format. This color removal can enhance any existing surface defects.
[0057] In operation 210, a portion of the enhanced image is extracted as input image blocks. In various embodiments, the system extracts consistent portions of the preprocessed image that include the same pixel dimensions. For example, the dimension of each input image block may be 64×64 pixels. However, other sizes of the input image blocks may be determined by the system configuration. The input image blocks may be extracted as two-dimensional segments of the image corresponding to the Y component. However, in some embodiments, if a certain color component or channel is included in the preprocessed image, the image block may include a third dimension.
[0058] Figure 3B Several examples of input image blocks are shown. In some embodiments, the input image blocks include overlapping portions of the enhanced images. For example, image blocks 322, 324, and 326 include overlapping portions of image 320. For illustrative purposes, input image block 322 is shown as being outlined with different line patterns. In such embodiments, each image block may overlap with adjacent image blocks by the same predetermined amount. By using overlapping input images, portions of an object can be analyzed by the model more than once, thereby improving the accuracy of the final defect score. However, by using overlapping input image blocks, more input image blocks will be needed to feed the entire enhanced image through the neural network, thus requiring additional processing time and resources.
[0059] As another example, the input image patch may be exactly adjacent to its neighboring image patches. This allows the entire image to be fed into the neural network while minimizing the number of image patches required, thereby reducing the processing time and resources needed. For example, image patches 330, 331, and 332 are adjacent to each other, such that pixels at the edges of adjacent image patches are located adjacent to each other in image 320.
[0060] In other embodiments, the extracted image patches can be separated by multiple pixels, thereby further reducing processing requirements, but some accuracy is sacrificed because not all parts of the object or enhanced image will be input into the neural network. For example, image patches 340, 341, and 342 are separated from each other by a set distance.
[0061] In operation 212, the input image patch is passed to the described computer vision-based model or neural network. In various embodiments, the input image patch is input as a pixel matrix. For example, the system can transform each image patch into a matrix with dimensions equal to the pixel dimension of the image patch. Each pixel can be represented by a matrix element and assigned a value based on the pixel's shading. For example, each matrix element can correspond to an integer in the set {0, 1, 2, ..., 255}, where 0 corresponds to black and 255 corresponds to white. In the specific example described, each input image patch is 64×64 pixels. Such an input image patch will produce a 64×64×1 matrix.
[0062] Then, depending on the system architecture, the input image patches can be fed into the neural network sequentially or in parallel. As previously mentioned, the system architecture may include a processing device implemented as an embedded target designed for specific control functions within a larger system, typically with real-time computational constraints. Such an embedded target can be embedded as part of a complete device that typically includes hardware and mechanical components. For example, the embedded target could be an embedded microcontroller unit (MCU) or embedded processor within a camera that implements the neural network. In various embodiments, the neural network is stored in flash memory or other storage devices corresponding to the embedded target, or in other accessible memory of the camera. In other embodiments, the processing device can be implemented as a local or cloud-based server. In edge computing configurations, large amounts of data can be processed near the source, reducing internet bandwidth usage and allowing parallel image input. However, in cases where the processing device is implemented as a centralized cloud-based server, additional processing time and power may be required to transfer images to the server for processing, which necessitates sequential image input.
[0063] In some embodiments, only input image patches containing portions of the object are fed into the neural network. Various object recognition techniques can be implemented to recognize input image patches that do not include any portions of the object, such as image patches 340 and 341. This reduces overall processing requirements by preventing the analysis of input image patches that do not include any portions of the imaged object.
[0064] In operation 214, the computer vision-based model outputs a probability score for each input image patch passed to the model. For example, a probability score between 0 and 1 can be determined for each input image patch, indicating the likelihood that the image in the input image patch contains a defect. Therefore, a score of 0 would indicate that no defect was detected, and a score of 1 would indicate that a defect was detected. In other words, a probability score of 1 means that the model has 100% confidence that a defect is shown in the input image patch, while an output probability score of 0.87 means that the model has 87% confidence in the presence of a defect.
[0065] In various embodiments, the model is trained to determine probability scores based on several factors. For example, the size and depth of scratches on a component represented by an image in an input image patch may affect the probability score. In various embodiments, the probability scores may be visualized for user viewing. References Figure 4A Example heatmap 410 is shown, reflecting the input image patch as determined by the probability scores. The coordinate axes of heatmap 410 indicate that the image is approximately 3840 × 880 pixels.
[0066] Included Figure 4A The scale 412 in the image indicates the probability score using shading from black (indicating a score of 0.00) to white (indicating a score of 1.00). In various embodiments, the region of image 410 corresponding to the input image patch is shaded based on the predicted presence of a defect within the image patch. Therefore, the shaded image patch indicates the location and severity of the estimated defect on the component. Figure 4A The shadowed image patches in the image are shown as overlapping, indicating the overlapping portions of the extracted input image patches.
[0067] In operation 216, the overall defect score of the object is determined. The overall defect score can be determined based on the probability score of each input image patch. In some embodiments, the overall defect score is the maximum value of the cumulative probability scores. For example, p(s1) represents the defect probability of the first image patch, p(s2) represents the defect probability of the second image patch, and so on, up to p(sN) of the Nth image patch. The overall defect score can be determined as max{p(s1), p(s2), ..., p(sN)}. However, in some embodiments, the overall defect score can be determined based on other methods. For example, the overall defect score can be determined based on the average of the cumulative probability scores.
[0068] In some embodiments, a component is determined to have unacceptable defects if the overall defect score is higher than a predetermined threshold. For example, a component with an overall defect score greater than 0.90 may be considered to have unacceptable defects. (Return to Reference) Figure 4A In the example, the maximum probability score is 0.93, therefore the overall defect score is 0.93.
[0069] refer to Figure 4B The image 420 is shown as an example image generated by the system according to one or more embodiments. Image 420 depicts a component having contour regions corresponding to defects detected by the model. In some embodiments, the contour regions may correspond to portions of an image included in an input image patch having a probability score higher than a predetermined threshold. For example, the contour regions may correspond to input image patches assigned a probability score greater than 0.90.
[0070] One or more of the various images described earlier can be displayed on the user interface. (See reference) Figure 5 An example user interface 500 displaying processed and inspected images is shown according to one or more embodiments. Images 300, 320, and 420 are displayed on the user interface 500. This allows the user of the system to visually view the analysis performed by the model. In some embodiments, a quality control status 510 indicating the acceptability of a component may be displayed. In some embodiments, an overall defect score may also be displayed.
[0071] In operation 218, the object can be further processed based on the determined defect score. In some embodiments, the described defect detection method can be performed after processing to analyze the final output part. Parts found to be acceptable (such as parts with a defect score of 0.90 or below) can be transferred for packaging or shipment. However, the described model can be implemented at different points on the assembly line, or at multiple points on the assembly line.
[0072] In some embodiments, parts can be repaired to correct defects. For example, a part may be automatically moved to another area of the assembly line to correct discovered defects. As another example, defective parts can be disposed of. In some embodiments, defective parts can be reprocessed or recycled to form new parts. Implementing computer vision-based models at different points can identify defective parts before further manufacturing, saving resources, materials, and costs. The rapid, automated defect detection provided by this model can also be used at different points in the manufacturing process to manage the performance of specific components on the assembly line and identify potential problems. For example, if a high percentage of parts are found to be defective after point B in the assembly line, but the same parts are acceptable after the previous point A, this may indicate a problem with the machining tooling starting from point B.
[0073] Computer vision-based models can be neural networks with various computational layers. (Reference) Figure 6 The figure illustrates an example neural network architecture 600 according to one or more embodiments, which is implemented to automatically detect defects. As shown, the neural network 600 includes a convolutional layer 612, a max-pooling layer 614, a flattening layer 616, a fully connected layer 618, and a fully connected layer 620.
[0074] Input image block 602 can be fed into convolutional layer 612. In various embodiments, input image block 602 can be an extracted portion of an image, such as input image block 330. In some embodiments, input image block 602 can be a portion of an image with an unknown defect state. In some embodiments, input image block 602 can be a training image with known corresponding defects. For example, the training image can include corresponding probability scores of 0 (indicating no defect) or 1 (indicating defect).
[0075] In various embodiments, convolutional layer 612 applies a filter K of a specific dimension to the pixel matrix of the input image patch. For example, the filter may include a 3x3x1 dimension. In some embodiments, the filter is applied with a stride of 8. The convolution operation extracts high-level features from the input image patch. The convolutional layer outputs a convolved matrix. The convolutional layer may apply the same padding or effective padding to the matrix to output the convolved matrix.
[0076] The output of the convolutional matrix is then fed into a max-pooling layer 614. In various embodiments, the max-pooling layer performs max-pooling of the convolutional matrix by returning the maximum value in the portion of the convolutional matrix covered by the max-pooling kernel. For example, the pooling size could be 2x2x1. In some embodiments, the neural network can apply an average pooling function instead of max-pooling, which returns the average of all values in the portion of the convolutional matrix covered by the max-pooling kernel. In one example, the output of the max-pooling layer could be a 64-unit matrix (64x64 matrix).
[0077] Therefore, pooling layers can reduce the spatial dimensionality of convolutional features, thereby reducing the computational power required to process the data through dimensionality reduction, and facilitating the extraction of salient features to maintain the training model. In some embodiments, the neural network may include a series of consecutive convolutional and max-pooling layers. For example, neural network 600 may include three consecutive convolutional-pooling pairs 615, where the output of the max-pooling layer is fed as input into the convolutional layer of the subsequent convolutional-pooling pair. The convolutional and max-pooling layers may implement a truncated normal distribution and a modified activation function for initialization. Thus, each convolutional-pooling pair 615 can take a 64-unit matrix as input and output a 64-unit matrix.
[0078] Neural networks can include any number of consecutive convolutional pooling pairs based on available processing resources and desired performance. Implementing three consecutive convolutional pooling pairs can minimize image processing latency while maintaining the desired level of accuracy in prediction. For example, using three convolutional pooling pairs in a neural network allows for a comprehensive analysis of each input image patch of a test image to determine the defect score of an object in approximately 5 seconds. Using a stride of 8 can further optimize the accuracy and latency of image processing (or runtime) based on the number of filters placed on each input image patch. Therefore, the inference process can be highly optimized to run from mobile devices or restricted embedded devices.
[0079] The output of the final max-pooling layer is then fed into a flattening layer 616 to flatten the output into a column vector. The column vector output is then fed into fully connected layers 618 and 620. In various embodiments, the fully connected layers may be multilayer perceptrons (feedforward neural networks). In some embodiments, the first fully connected layer 618 implements a rectified linear unit (ReLU) as the activation function. In an example embodiment, the first fully connected layer 618 may include 128 neurons. However, more or fewer neurons may be implemented in different embodiments. In some embodiments, the second fully connected layer 620 implements a sigmoid activation function. In some embodiments, the fully connected layers may implement a truncated normal distribution for initialization.
[0080] During training, neural network 600 can be configured to generate probabilities that a particular input image patch contains a defect. In various embodiments, output 630 can be set to a probability score of 1 if the training image contains a known defect, or to a probability score of 0 if the training image does not contain any defects. Using the known probability scores, the weights (or parameters) in the fully connected layers can be updated using backpropagation. For example, the parameters can be updated using a stochastic gradient descent algorithm with Adam optimization. In some embodiments, this can be achieved by using a softmax function to convert the activation values of the output layer neurons into probabilities.
[0081] In some embodiments, training of the neural network can be performed at a centralized server system in a global or cloud network. In some embodiments, training data, such as weights, parameters, and training images, can be stored in the centralized server system. Updated weights can then be transferred from the centralized server system to a local edge computing device for more efficient image processing. As previously mentioned, the local edge computing device can be an embedded target of a client device (such as camera 154), for example, an MCU or embedded processor. In some embodiments, the parameters of the neural network can be periodically updated at the centralized server based on new training data. However, in some embodiments, training of the neural network can be performed at a local edge computing device.
[0082] In some embodiments, the neural network is fully trained once a predetermined number of training images are input into the model. In some embodiments, the neural network is fully trained once it is able to generate predictions with the desired accuracy.
[0083] Once fully trained, the neural network can then operate in inference mode to take an input image patch with unknown defect characteristics as input 602. The neural network then passes the input through the described layers and generates an output 630 for the input image patch between 0 and 1 based on updated weights, indicating the probability that the input image patch contains a defect.
[0084] refer to Figure 7A , Figure 7B and Figure 7C An example method 700 for training and operating a neural network for computer vision-based defect detection is shown. The neural network may be a neural network 600 and may include one or more computational layers 702. As discussed earlier, one or more of the following layers may be included: convolutional layers, max-pooling layers, flattening layers, and fully connected layers. According to one or more embodiments, Figure 7B An example of the neural network operating in training mode 710 is shown. Figure 7CAn example of the operation of the neural network in inference mode 730 is shown.
[0085] In training mode, a neural network is trained using a dataset of training images to detect defects. When operating in training mode 710, multiple historical datasets are received at operation 711. The historical datasets may include multiple training images 717 corresponding to one or more known defects. In some embodiments, the training images may represent or correspond to input image patches extracted from images of one or more objects. In some embodiments, the training images may include corresponding values indicating whether the training image includes a defect on a corresponding portion of an object. For example, if the training image shows a relevant defect, the training image may be associated with a probability score of 1, or if the training image does not show a relevant defect, the training image may be associated with a probability score of 0. These values may be stored in the image files of the training images, for example, in metadata.
[0086] In operation 713, each training image is converted into a corresponding matrix representation. As mentioned earlier, the matrix representation can correspond to the pixel dimension of the training image. For example, a training image can be 64x64 pixels and represented in only one color channel (luminance). Therefore, the dimension of the corresponding matrix can be 64x64x1.
[0087] In operation 715, each corresponding matrix representation is input into the neural network to adjust the weighting parameters 719 in the various layers of the neural network based on one or more known defects. In some embodiments, the weighting parameters 719 may correspond to the dimension of the matrix representation. Known probability scores can be input into the neural network along with the matrix representation to generate and update the parameters in the fully connected layers of the neural network. Thus, the neural network can be trained (721) to accurately output the probability scores of the input image patch having unknown defects using the weighting parameters 719.
[0088] In some embodiments, the predictive merchant association model can be determined to be sufficiently trained once a desired error rate is achieved. For example, the desired error rate could be 0.00001% (or an accuracy of 99.9999%). In other embodiments, the model can be determined to be sufficiently trained after a set number of epochs or iterations, such as after a predetermined number of training images have been fed into the model. For example, the model can be considered sufficiently trained when 1000 training images along with known probability scores have been fed into the neural network. Once sufficiently trained, the neural network can be implemented to detect defects in new images in inference mode 730.
[0089] When operating in inference mode 730, a test image 743 of the object (such as object 310) is obtained at operation 731. Test image 743 is not part of the historical dataset and may include portions with unknown potential defects. For example, test image 743 of the part may be obtained at one of various points on the assembly line during the manufacturing process. Then, at operation 733, the test image is preprocessed before being input into the neural network for analysis. In some embodiments, the test image is preprocessed to remove background from the image of the part. In some embodiments, the test image is preprocessed to represent the image only in luminance components in YCbCr format. Various other image preprocessing techniques can be implemented on the test image as previously discussed with reference to operation 208.
[0090] In operation 735, a portion of the test image is extracted as multiple input image blocks 745 for input into the neural network. For example, the input image blocks could be reference images. Figure 3B The input image block 745 can be any one of the input image blocks described. Each input image block 745 can correspond to an extracted portion of the test image. The pixel dimensions of the input image blocks can be the same as those of the training images.
[0091] In operation 737, each input image patch is fed into the neural network to automatically generate a probability score 749 for each input image patch 745 using weighting parameters 719. Each input image patch 745 can be input into the neural network as a corresponding matrix representation 747 (similar to the training image 717). As described above, the input image patches can be input into the neural network serially or in parallel. The probability score 749 of each input image patch indicates the probability that the input image patch contains a predicted defect.
[0092] Once the probability scores of the input image patches corresponding to each part of the test image are determined, in operation 739, a defect score 751 is generated for the test image based on the probability score of each input image patch. The defect score 751 can indicate the condition of the object. In some embodiments, the defect score can be the maximum value among the determined probability scores 749. For example, a defect score above a predetermined threshold can be determined as unsuitable for sale or use. As another example, the defect score can be the average of the probability scores.
[0093] Components with defect scores exceeding a predetermined threshold can be discarded, rendering them unusable. In some embodiments, defective components may be further processed to repair or remove the identified defects. Analysis of the images can be visualized for viewing by the system user. For example, in operation 741, a heatmap of the input image patch, such as heatmap 410, can be generated based on the probability score. Other output images, such as image 420, can be generated. These output images can be displayed on a user interface, such as interface 500, allowing the system user to view the detected defects. This allows the user to locate defects for removal or repair.
[0094] In some embodiments, operation 743 can confirm the predicted defect within the test image or the corresponding input image patch, and use the defect to further train and fine-tune the neural network. For example, the probability score can be confirmed by the user at a user interface displaying the input image patch image and the corresponding probability score. The user can then confirm whether the image or a specific image patch displays a defect. If the user confirms the presence of a defect, the relevant probability score of the input image patch can be set to 1. If the user confirms the absence of a defect, the relevant probability score of the input image patch can be changed to 0.
[0095] The input image patch selected for verification at operation 743 can be randomly selected from one or more different test images obtained during inference mode. However, in some embodiments, input image patches with probability scores within a predetermined range can be selected for verification. For example, input image patches with probability scores between 0.4 and 0.6 can be selected for verification. These images may correspond to instances where the neural network cannot identify defects with sufficient certainty.
[0096] Once the input image patches are confirmed, they can be fed into the neural network during training to refine the network's weighting parameters. For example, the method can return to operations 713 or 715 to transform and input the confirmed input image patches as training images with confirmed probability scores. In some embodiments, the processed input image patches are returned to retrain the neural network at a regular batch size, which may include a predetermined number of processed input image patches, such as 100. For example, a batch of confirmed input image patches may include the historical dataset received at operation 711. This improves network performance over time as the network sees more examples.
[0097] refer to Figure 8This illustrates specific examples of computer systems that can be used to implement specific embodiments of the present disclosure. For example, according to the various embodiments described above, computer system 800 can represent a client device, server, or other edge computing device. According to specific example embodiments, system 800 suitable for implementing specific embodiments of the present disclosure includes processor 801, memory 803, accelerator 805, interface 811, and bus 815 (e.g., PCI bus or other interconnection structure). When operating under the control of appropriate software or firmware, processor 801 is responsible for training and implementing the described computer model and neural network. The processor may also be responsible for controlling the operation of the camera and transmitting data over a network between the client device and the server system. Various specially configured devices may also be used in place of or attached to processor 801. The complete implementation may also be done in custom hardware.
[0098] Interface 811 may include separate input and output interfaces, or it may be a unified interface supporting both operations. When operating under the control of appropriate software or firmware, processor 801 is responsible for tasks such as implementing neural networks or computer vision-based models. Various specially configured devices may also be used in place of processor 801 or attached to processor 801. The complete implementation may also be done in custom hardware. Interface 811 is typically configured to send and receive data packets or fragments over a network. Specific examples of interfaces supported by the device include Ethernet interfaces, Frame Relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, etc. Typically, these interfaces may include ports suitable for communication with appropriate media. In some cases, they may also include a separate processor, and in some cases, volatile RAM. The separate processor can control communication-intensive tasks such as packet switching, media control, and management.
[0099] In addition, a variety of very high-speed interfaces can be provided, such as Fast Ethernet, Gigabit Ethernet, ATM, HSSI, POS, and FDDI interfaces. Typically, these interfaces may include ports suitable for communication with appropriate media. In some cases, they may also include a dedicated processor, and in others, volatile RAM. The dedicated processor can control communication-intensive tasks such as packet switching, media control, and management.
[0100] According to a specific example embodiment, system 800 uses memory 803 to store data and program instructions and maintain a local cache. For example, program instructions may control the operation of an operating system and / or one or more applications. The memory(s) may also be configured to store received metadata and metadata for bulk requests.
[0101] In some embodiments, system 800 further includes a graphics processing unit (GPU) 809. As described above, GPU 809 may be implemented to process each pixel on a separate thread. In some embodiments, system 800 also includes an accelerator 805. In various embodiments, accelerator 805 is a rendering accelerator chip that can be decoupled from the graphics processing unit. Accelerator 805 may be configured to accelerate the processing of the entire system 800 by processing pixels in parallel to prevent overloading of system 800. For example, in some cases, ultra-high-definition images may be processed, which include many pixels, such as DCI 4K or UHD-1 resolution. In such cases, the excessive number of pixels may be more than can be processed on a standard GPU processor (e.g., GPU 809). In some embodiments, accelerator 805 may only be used when high system load is anticipated or detected.
[0102] In some embodiments, accelerator 805 may be a hardware accelerator in a unit separate from the CPU (e.g., processor 801). Accelerator 805 may implement automatic parallelization capabilities to utilize multiple processors simultaneously in a shared-memory multiprocessor machine. At the heart of the accelerator 805 architecture may be a hybrid design using fixed-function units with well-defined operations and programmable units requiring flexibility. In various embodiments, accelerator 805 may be configured to adapt to APIs, particularly higher performance and extensions in OpenGL 2 and DX9.
[0103] Because such information and program instructions can be used to implement the systems / methods described herein, this disclosure relates to tangible machine-readable media, including program instructions, status information, etc., for performing the various operations described herein. Examples of machine-readable media include hard disks, floppy disks, magnetic tapes, optical media such as CD-ROMs and DVDs; magneto-optical media such as optical discs; and hardware devices specifically configured to store and execute program instructions, such as read-only memory devices (ROMs) and programmable read-only memory devices (PROMs). Examples of program instructions include machine code (e.g., generated by a compiler) and files containing high-level code that can be executed by a computer using an interpreter.
[0104] Although many components and processes have been described above in the singular for convenience, those skilled in the art will understand that the techniques disclosed herein can also be practiced using multiple components and repetitive processes.
[0105] Although the invention has been specifically shown and described with reference to particular embodiments thereof, those skilled in the art will understand that changes in form and detail of the disclosed embodiments may be made without departing from the spirit or scope of the invention. Therefore, this disclosure is intended to be construed as including all variations and equivalents falling within the true spirit and scope of this disclosure.
Claims
1. A method comprising: Training a neural network to detect defects, wherein training the neural network includes: Receive multiple historical datasets, including multiple training images corresponding to one or more known defects. Each training image is converted into a corresponding matrix representation, and Each corresponding matrix representation is input into the neural network to adjust the weighting parameters based on one or more known defects, wherein the weighting parameters correspond to the dimension of the matrix representation, and wherein the dimension of the matrix representation corresponds to the pixel dimension of the training image; Obtain a test image of the object, wherein the test image is not part of the historical dataset; A portion of the test image is extracted as multiple input image blocks for input into the neural network, each input image block corresponding to an extracted portion of the test image; Each input image patch is input into the neural network as a corresponding matrix representation to automatically generate a probability score for each input image patch using the weighting parameters; The probability score of each input image patch indicates the probability that the input image patch includes a predicted defect, wherein a defect score of the test image is generated based on the probability score of each input image patch, and wherein the defect score indicates the condition of the object; Select an input image block with a probability score within a predetermined range from the plurality of input image blocks for confirmation; The user interface is used to confirm whether each selected input image block displays defects. Among them, the selected input image block that is identified as having a display defect is the confirmed input image block; Accumulate the confirmed input image blocks until a predetermined number of confirmed input image blocks are reached; The neural network is retrained using a batch of accumulated confirmed input image patches to refine the weighting parameters and create refined weighting parameters; and The refined weighted parameters are distributed to edge computing devices for subsequent defect detection.
2. The method according to claim 1, wherein, Each input image block includes a 64×64×1 matrix corresponding only to the luminance component, and the neural network includes three consecutive pairs of convolutional and max-pooling layers, each pair applying a filter with a stride of 8, thereby optimizing the accuracy and latency of the image processing.
3. The method according to claim 1, wherein, The input image blocks have a uniform height and a uniform width.
4. The method according to claim 1, wherein, The input image block includes the overlapping portion of the test image.
5. The method according to claim 1, wherein, The input image blocks are aligned such that each input image block is adjacent to one or more other input image blocks.
6. The method according to claim 1, wherein, The predetermined range includes probability fractions between 0.4 and 0.6, and the predetermined number of confirmed input image blocks includes 100 confirmed input image blocks.
7. The method of claim 1, further comprising generating a heatmap of the input image patch based on the probability score.
8. The method according to claim 1, wherein, Before the test image is passed to the neural network, the test image is preprocessed to remove the background and represent the image only in the luminance component in YCbCr format.
9. A server system, comprising: The interface is configured to receive: Multiple historical datasets, comprising multiple images corresponding to one or more known defects of one or more levels, and A test image of the object, wherein the test image is not part of the historical dataset; The memory, which is configured to store: The historical dataset and the test images; and A processor associated with a neural network, wherein the processor is configured to: The following steps are used to train a neural network to detect defects: Each training image is converted into a corresponding matrix representation. Each corresponding matrix representation is input into the neural network to adjust the weighting parameters based on one or more known defects, wherein the weighting parameters correspond to the dimension of the matrix representation, and wherein the dimension of the matrix representation corresponds to the pixel dimension of the training image; A portion of the test image is extracted and used as multiple input image blocks into the neural network, with each input image block corresponding to an extracted portion of the test image; Each input image patch is input into the neural network as a corresponding matrix representation to automatically generate a probability score for each input image patch using the weighting parameters; The probability score of each input image patch indicates the probability that the input image patch includes a predicted defect, wherein a defect score of the test image is generated based on the probability score of each input image patch, and wherein the defect score indicates the condition of the object; Select an input image block with a probability score within a predetermined range from the plurality of input image blocks for confirmation; The user interface is used to confirm whether each selected input image block displays defects. Among them, the selected input image block that is identified as having a display defect is the confirmed input image block; Accumulate the confirmed input image blocks until a predetermined number of confirmed input image blocks are reached; The neural network is retrained using a batch of accumulated confirmed input image patches to refine the weighting parameters and create refined weighting parameters; and The refined weighted parameters are distributed to edge computing devices for subsequent defect detection.
10. The server system according to claim 9, wherein, Each input image block includes a 64×64×1 matrix corresponding only to the luminance component, and the neural network includes three consecutive pairs of convolutional and max-pooling layers, each pair applying a filter with a stride of 8, thereby optimizing the accuracy and latency of the image processing.
11. The server system according to claim 9, wherein, The input image blocks have a uniform height and a uniform width.
12. The server system according to claim 9, wherein, The input image block includes the overlapping portion of the test image.
13. The server system according to claim 9, wherein, The input image blocks are aligned such that each input image block is adjacent to one or more other input image blocks.
14. The server system according to claim 9, wherein, The predetermined range includes probability fractions between 0.4 and 0.6, and the predetermined number of confirmed input image blocks includes 100 confirmed input image blocks.
15. The server system of claim 9, further comprising generating a heatmap of the input image patch based on the probability score.
16. The server system according to claim 9, wherein, Before the test image is passed to the neural network, the test image is preprocessed to remove the background and represent the image only in the luminance component in YCbCr format.
17. A non-transitory computer-readable medium storing one or more programs configured for execution by a computer, said one or more programs comprising instructions for: The following steps are used to train a neural network to detect defects: Receive multiple historical datasets, including multiple training images corresponding to one or more known defects. Each training image is converted into a corresponding matrix representation, and Each corresponding matrix representation is input into the neural network to adjust the weighting parameters based on one or more known defects, wherein, The weighting parameter corresponds to the dimension of the matrix representation, and wherein the dimension of the matrix representation corresponds to the pixel dimension of the training image; Obtain a test image of the object, wherein the test image is not part of the historical dataset; A portion of the test image is extracted as multiple input image blocks for input into the neural network, each input image block corresponding to an extracted portion of the test image; Each input image patch is input into the neural network as a corresponding matrix representation to automatically generate a probability score for each input image patch using the weighting parameters; The probability score of each input image patch indicates the probability that the input image patch includes a predicted defect, wherein a defect score of the test image is generated based on the probability score of each input image patch, and wherein the defect score indicates the condition of the object; Select an input image block with a probability score within a predetermined range from the plurality of input image blocks for confirmation; The user interface is used to confirm whether each selected input image block displays defects. Among them, the selected input image block that is identified as having a display defect is the confirmed input image block; Accumulate the confirmed input image blocks until a predetermined number of confirmed input image blocks are reached; The neural network is retrained using a batch of accumulated confirmed input image patches to refine the weighting parameters and create refined weighting parameters; and The refined weighted parameters are distributed to edge computing devices for subsequent defect detection.
18. The non-transitory computer-readable medium according to claim 17, wherein, Each input image block includes a 64×64×1 matrix corresponding only to the luminance component, and the neural network includes three consecutive pairs of convolutional and max-pooling layers, each pair applying a filter with a stride of 8, thereby optimizing the accuracy and latency of the image processing.
19. The non-transitory computer-readable medium according to claim 17, wherein, The input image blocks have a uniform height and a uniform width.
20. The non-transitory computer-readable medium according to claim 17, wherein, The predetermined range includes probability fractions between 0.4 and 0.6, and the predetermined number of confirmed input image blocks includes 100 confirmed input image blocks.