An image classification method and apparatus

By processing part images through image pose correction and a hierarchical classification network, the problem of high cost of part hierarchical classification in existing technologies is solved, and unified classification and efficient hierarchical recognition of part images from any angle are achieved.

CN116071585BActive Publication Date: 2026-06-16CHINA TELECOM CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA TELECOM CLOUD TECH CO LTD
Filing Date
2022-12-30
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing part classification methods have poor universality and require individual design for each part, resulting in high costs.

Method used

An image pose correction network is used to correct the pose of part images, extract salient features, and perform grading through a grading network, which is applicable to part images at any angle.

🎯Benefits of technology

It has achieved a unified classification of various types of parts, reducing costs and improving the universality and efficiency of the classification.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116071585B_ABST
    Figure CN116071585B_ABST
Patent Text Reader

Abstract

The application provides an image classification method and device, after obtaining a to-be-processed image containing a part, inputting the to-be-processed image into a part posture correction network, correcting the posture of the part in the to-be-processed image, and outputting a standard image after posture correction of the part; according to the obtained standard image, obtaining a plurality of significant features corresponding to the part, inputting the plurality of significant features after fusion into a preset level classification network, and outputting a level corresponding to the part. By designing the posture correction network, the part image of the input image can be an image of any angle, without being limited to a specific angle. The part image of any angle can be corrected into a standard image, and then subsequent processing is performed, the plurality of significant features corresponding to the part are obtained through identification of the standard image, the level corresponding to the part is output through analysis after fusion of the significant features of the part, so that the scheme of the application can be applied to classification of a plurality of types of parts, and cost is saved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments of the present invention relate to the field of artificial intelligence, and in particular to an image classification method, apparatus, electronic device and readable storage medium. Background Technology

[0002] Parts grading refers to classifying manufactured parts into different grades based on their quality, and is a necessary step in parts sales and production optimization.

[0003] Existing part classification methods are mostly designed for specific parts, and the design process also requires hardware design, such as the design of sampling stages, lighting angles, and camera angles. The accuracy of the hardware directly affects the part classification results.

[0004] The above-mentioned part classification method has poor universality, and designing a separate classification method for each type of part results in high costs. Summary of the Invention

[0005] In view of the above problems, embodiments of the present invention are proposed to provide a video processing method, apparatus, electronic device and readable storage medium that overcomes or at least partially solves the above problems.

[0006] In a first aspect, embodiments of this application disclose an image classification method, the method comprising:

[0007] Acquire an image to be processed, wherein the image to be processed is a captured image of the part;

[0008] The image to be processed is input into a preset part posture correction network to perform posture correction on the parts in the image to be processed and output a standard image of the parts after posture correction.

[0009] Based on the standard image, several salient features corresponding to the part are obtained;

[0010] After fusing the multiple salient features, the data is input into a preset classification network to output the corresponding level of the part.

[0011] Secondly, embodiments of this application disclose a video classification device, the device comprising:

[0012] The first acquisition module is used to acquire an image to be processed, wherein the image to be processed is a photographed image of a part;

[0013] The image correction module is used to input the image to be processed into a preset part posture correction network, perform posture correction on the part in the image to be processed, and output a standard image of the part after posture correction.

[0014] The second acquisition module is used to obtain multiple salient features corresponding to the part based on the standard image;

[0015] The grading module is used to fuse the multiple salient features, input them into a preset grading network, and output the grade corresponding to the part.

[0016] Thirdly, embodiments of this application also disclose an electronic device, including a processor and a memory, wherein the memory stores a program or instructions that can run on the processor, and the program or instructions, when executed by the processor, implement the steps of the image classification method as described in the first aspect.

[0017] Fourthly, embodiments of this application also disclose a readable storage medium storing a program or instructions that, when executed by a processor, implement the steps of the image classification method as described in the first aspect.

[0018] In this embodiment, after acquiring an image containing a part, the image is input into a part pose correction network to correct the pose of the part and output a standard image of the corrected part. Based on the standard image, multiple salient features corresponding to the part are obtained. These salient features are then fused and input into a preset classification network to output the level of the part. By designing the pose correction network, the part image can be an image from any angle, without being limited to a specific angle. Any part image at any angle can be corrected to a standard image by the pose correction network before further processing. By recognizing the standard image, multiple salient features corresponding to the part are obtained. These salient features represent the detailed features of the part. By fusing and analyzing the salient features, the level of the part is output, making the solution applicable to various types of part classification and saving costs. Attached Figure Description

[0019] Figure 1 This is a flowchart of the steps of an image classification method provided in an embodiment of the present invention;

[0020] Figure 2 This is a structural diagram of an attitude correction network provided in an embodiment of the present invention;

[0021] Figure 3 This is a flowchart of obtaining significant features provided by an embodiment of the present invention;

[0022] Figure 4 This is a flowchart of another image classification method provided in an embodiment of the present invention;

[0023] Figure 5 This is an image classification device provided in an embodiment of the present invention;

[0024] Figure 6 This is a block diagram of an electronic device provided in an embodiment of the present invention;

[0025] Figure 7 This is a block diagram of another electronic device according to another embodiment of the present invention. Detailed Implementation

[0026] Exemplary embodiments of the invention will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this invention will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

[0027] refer to Figure 1 This illustrates an image classification method provided by an embodiment of this application, the method comprising:

[0028] Step 101: Obtain the image to be processed, which is a photographed image of the part.

[0029] In this embodiment of the invention, the part can be a component obtained after disassembling the entire machine. The image to be processed can be an image that includes one part. The format of the image to be processed can be JPEG (Joint Photographic Experts Group), TIFF (Tag Image File Format), PNG (Portable Network Graphics), etc. The image to be processed can be an image of any size, and the image of the part in the image to be processed can be an image from any angle.

[0030] Step 102: Input the image to be processed into a preset part posture correction network, perform posture correction on the parts in the image to be processed, and output a standard image of the parts after posture correction.

[0031] In this embodiment of the invention, the part pose correction network can be a pre-trained network for correcting the pose of parts in the image to be processed, and the corrected standard image can be the image most suitable for part level classification.

[0032] Specifically, refer to Figure 2 , Figure 2The diagram shows the structure of the attitude correction network. An image of a part taken from any angle can be transformed into a standard image with a standard pose after processing by the attitude correction network. First, the image of the part taken from any angle is input into the localization network. The localization network extracts image information and outputs a 6-dimensional vector θ, which is the vector representation of the image to be processed. This vector corresponds to the transformation matrix A. θ The six elements in the input image to be processed. Assume a pixel in the input image to be processed. The coordinates are Corresponding point after attitude correction The coordinates are After obtaining the transformed coordinates, the coordinate points are normalized so that the corrected coordinate points satisfy... By normalizing the transformed coordinate points, we can prevent their coordinate values ​​from becoming too large or too small, which could affect the subsequent classification calculations. The transformations between coordinate points are as follows:

[0033]

[0034] Among them, A θ The transformation matrix is ​​denoted by 1, and 1 represents a preset parameter. The pose correction network can be embedded as a functional module into the overall classification network structure. Compared to the classification network, the pose correction network has a simpler structure, introduces less additional computation, and thus consumes fewer computational resources. Furthermore, when training the pose correction network, there is no need to manually specify a standard pose; the network can adaptively learn the most suitable standard part pose for hierarchical classification from a batch of sample part images.

[0035] Step 103: Based on the standard image, obtain multiple salient features corresponding to the part.

[0036] In an embodiment of the present invention, reference is made to Figure 3 , Figure 3 The flowchart for obtaining salient features describes how a standard image, after pose correction, is processed by a feature extraction network to obtain general features. These features are the result of feature abstraction from the input image to be processed. Because the information in the original image is too redundant to be used directly, the feature extraction network extracts features that can be used for further processing. The standard image is then input into K salient region modeling networks. Each salient region modeling network focuses on a local region of the part, which contains key information about the part. The salient region map after salient region modeling is fused with the general features to obtain the corresponding salient features.

[0037] Specifically, modeling each salient region can focus on a local area of ​​the part, saving computational overhead on the entire image. This modeling method is adaptively trained and does not rely on manual designation of key areas of the part, thus reducing labor costs.

[0038] Step 104: After fusing the multiple salient features, input them into a preset grade classification network and output the grade corresponding to the part.

[0039] In this embodiment of the invention, after obtaining the corresponding salient features, the K salient features are fused. The salient feature fusion method can be by directly adding the K salient features or concatenating them. The fused features are then input into a grading classification network for grading classification, and the grading of the parts can be output. The grading classification network can be a pre-trained network used to classify parts in the image to be processed. For different parts, different sample datasets can be used for training, enabling the grading classification network to classify different parts according to their grading. The sample dataset for training the grading classification network can be part images that have already been labeled with gradings, and the part images can be images from any angle.

[0040] In summary, in this embodiment, after obtaining the image to be processed containing the part, the image is input into a part pose correction network to correct the pose of the part in the image and output a standard image of the part after pose correction. Based on the obtained standard image, multiple salient features corresponding to the part are obtained. After fusing the multiple salient features, the network is input into a preset classification network to output the level corresponding to the part. By designing the pose correction network, the part image input can be an image from any angle, without being limited to a specific angle. Part images from any angle can be corrected into standard images by the pose correction network before subsequent processing. By recognizing the standard image, multiple salient features corresponding to the part are obtained. These salient features can represent the detailed features of the part. By analyzing the fusion of the salient features of the part, the level corresponding to the part is output. This makes the solution of this application applicable to various types of part classification and saves costs.

[0041] refer to Figure 4 It illustrates a flowchart of another image classification method provided in an embodiment of this application, including:

[0042] Step 201: Obtain the image to be processed, which is a photographed image of the part.

[0043] This step is the same as step 101, and will not be repeated here.

[0044] Step 202: Input the image to be processed into a preset part posture correction network, perform posture correction on the parts in the image to be processed, and output a standard image of the parts after posture correction.

[0045] This step is the same as step 102, and will not be repeated here.

[0046] Optionally, step 202 specifically includes:

[0047] Sub-step 2021: Obtain the coordinates of the four corner points of the image to be processed, and the four coordinates located on the four sides of the image to be processed as the original coordinates.

[0048] In this embodiment of the invention, the standard image is obtained by transforming the image to be processed by selecting the coordinates of the four corner points of the image to be processed, and arbitrarily choosing one coordinate from each edge of the image to be processed as reference coordinates. By transforming the reference coordinates, the pose of the parts in the transformed image is made to be the pose most suitable for hierarchical classification.

[0049] Sub-step 2022: Determine the standard image after pose correction of the image to be processed based on the following first loss function:

[0050]

[0051] in, This refers to the x-coordinate corresponding to the original coordinates after attitude correction. The ordinate is the ordinate corresponding to the original coordinates after attitude correction.

[0052] In this embodiment of the invention, the attitude correction network can correct the attitude of the part. The effect of the attitude correction network is that the part, which might have been tilted or located in a corner of the original image, is now placed in the most suitable position in the image after correction. In the attitude correction module, matrix A... θ These are the parameters that specify how the part's position is transformed, A. θ This can be learned through a model. Because matrix A... θ The value of is arbitrary, which can easily lead to situations where, after transforming the part, parts of the part in the transformed image exceed the effective area of ​​the image, resulting in too little effective information in the transformed image, with most of the transformed image being newly introduced background, thus affecting the part classification effect. Therefore, this application designs a first loss function, which is used to ensure that the standard image obtained after transforming the image to be processed completely includes the part information in the image to be processed. The first loss function is used to ensure that all coordinate points on the boundary of the transformed standard image exceed the area of ​​the original image to be processed as much as possible. That is, the first loss function is used to enlarge the standard image so that the standard image can completely cover the area of ​​the image to be processed, ensuring that the part pose in the standard image is complete after adjustment.

[0053] Step 203: Input the standard image into the feature extraction network to obtain the general features corresponding to the part.

[0054] In this embodiment of the invention, a feature extraction network is used to extract general features from a standard image. These general features are the result of abstracting the features of the image to be processed, removing redundant features from the image. The obtained general features can be used for further calculations.

[0055] Step 204: Input the standard image into a multiple salient region modeling network to obtain multiple salient regions corresponding to the part.

[0056] In this embodiment of the invention, one salient region modeling network corresponds to a local area of ​​a part, and multiple salient region modeling networks are used to model multiple salient regions of the part. A salient region of the part can be a region containing detailed features of the part. To avoid different salient region modeling networks corresponding to the same local area of ​​the part, this application designs a second loss function to ensure that each salient region modeling network can correspond to a different local area of ​​the part.

[0057] Optionally, step 204 specifically includes:

[0058] Sub-step 2041, according to the following second loss function, restricts the plurality of salient regions to not overlap:

[0059]

[0060] Among them, c i and c j Let σ represent the centers of the i-th and j-th salient regions, respectively, where K represents the number of salient regions. 2 These are preset parameters.

[0061] In this embodiment of the invention, when c i and c j When they coincide, the loss reaches its maximum when c i and c j As the distance increases, the loss gradually decreases, and the second loss function can constrain different salient regions to be located far apart from each other. σ 2 It can be obtained through experiments and used to adjust the output of the loss function.

[0062] Optionally, step 204 specifically includes:

[0063] Sub-step 2042, using the following third loss function, ensures that the salient region corresponds to a local part of the part:

[0064]

[0065] Where s(i) represents the significance value of point i in the significant region, c represents the center of the significant region, and p i This indicates the position of point i.

[0066] In this embodiment of the invention, H×W is used to represent the salient region. When point i is far from c and s(i) is large, the loss value is large; when point i is close to c and s(i) is small, the loss value is small. This loss can constrain the salient region of the part to be as close as possible to the center of the salient region, thereby ensuring the compactness of the salient region, so that the salient region corresponds to the specific location of the part, rather than to the entire part image.

[0067] Step 205: The multiple salient regions are fused with the general features respectively to obtain the corresponding multiple salient features.

[0068] In this embodiment of the invention, general features are fused with salient regions to obtain corresponding salient features, so that the salient features include information of general features, thus ensuring the integrity of the salient features.

[0069] Step 206: After fusing the multiple salient features, input them into a preset grade classification network and output the grade corresponding to the part.

[0070] This step can be referred to in step 104, and will not be repeated here.

[0071] Optionally, before step 202, the method further includes:

[0072] Step 207: Obtain the first sample dataset, which includes part pose images from different angles.

[0073] In this embodiment of the invention, in order for the part attitude correction network to perform attitude correction on parts at any angle, the part attitude correction network needs to be trained so that it learns the attitude of the part most suitable for part classification.

[0074] Step 208: Input the first sample dataset into the part pose correction network to train the part pose correction network.

[0075] In this embodiment of the invention, the first sample dataset is input into the part pose correction network, and the model is trained using the first loss function described above, so that the part pose correction network can accurately output the standard pose of the part.

[0076] Optionally, the method further includes:

[0077] Step 209: Obtain the second sample dataset, which includes part images with grade labels.

[0078] In this embodiment of the invention, the second sample dataset includes: part images with grade labels. The part images can be labeled with grades in advance so that after being input into the grade classification network, the grade classification network can learn the part grades corresponding to different part images.

[0079] Step 210: Input the second sample dataset into the hierarchical classification network and train the hierarchical classification network.

[0080] In this embodiment of the invention, the second sample dataset is input into the hierarchical classification network, and the model is trained using the fourth loss function below, so that the hierarchical classification network can accurately classify the parts.

[0081] Optionally, step 210 specifically includes:

[0082] Sub-step 2101: Train the ranking classification network according to the following fourth loss function:

[0083] L total =L cls +αL overlap +βL outer +γL inner

[0084] Wherein, the L cls Let L be the cross-entropy loss function. overlap Let L be the first loss function. outer For the second loss function, L inner The third loss function is α, β, and γ, which are weight coefficients of the loss function used to fine-tune the model and optimize its performance.

[0085] In this embodiment of the invention, a classification network is trained using a comprehensive loss function that includes a first loss function, a second loss function, and a third loss function. Here, α, β, and γ are weight coefficients of the loss function. These weight coefficients can be obtained experimentally and can be set according to actual needs. This embodiment of the invention does not impose any limitations on these weight coefficients.

[0086] Furthermore, L cls Cross-entropy loss function is used to measure the difference between two probability distributions, specifically the difference between the distribution learned by the model and the true distribution.

[0087] In summary, in this embodiment, after obtaining the image to be processed containing the part, the image is input into a part pose correction network to correct the pose of the part in the image and output a standard image of the part after pose correction. Based on the obtained standard image, multiple salient features corresponding to the part are obtained. After fusing the multiple salient features, the network is input into a preset classification network to output the level corresponding to the part. By designing the pose correction network, the part image input can be an image from any angle, without being limited to a specific angle. Part images from any angle can be corrected into standard images by the pose correction network before subsequent processing. By recognizing the standard image, multiple salient features corresponding to the part are obtained. These salient features can represent the detailed features of the part. By analyzing the fusion of the salient features of the part, the level corresponding to the part is output. This makes the solution of this application applicable to various types of part classification and saves costs.

[0088] refer to Figure 5 It illustrates an image classification apparatus provided in an embodiment of this application, comprising:

[0089] The first acquisition module 301 is used to acquire an image to be processed, wherein the image to be processed is a photographed image of a part;

[0090] Image correction module 302 is used to input the image to be processed into a preset part posture correction network, perform posture correction on the part in the image to be processed, and output a standard image after part posture correction.

[0091] The second acquisition module 303 is used to obtain multiple salient features corresponding to the part based on the standard image;

[0092] The grade classification module 304 is used to fuse the multiple salient features, input them into a preset grade classification network, and output the grade corresponding to the part.

[0093] Optionally, the second acquisition module includes:

[0094] The general feature extraction submodule is used to input the standard image into the feature extraction network to obtain the general features corresponding to the part.

[0095] The salient region extraction submodule is used to input the standard image into a multiple salient region modeling network to obtain multiple salient regions corresponding to the part;

[0096] The fusion submodule is used to fuse the multiple salient regions with the general features respectively to obtain the corresponding multiple salient features.

[0097] Optionally, the device further includes:

[0098] The first sample acquisition module is used to acquire a first sample dataset, which includes part pose images from different angles.

[0099] The first training module is used to input the first sample dataset into the part posture correction network and train the part posture correction network.

[0100] Optionally, the image correction module includes:

[0101] The coordinate acquisition submodule is used to acquire the coordinates of the four corner points of the image to be processed, as well as the four coordinates located on the four sides of the image to be processed, as the original coordinates.

[0102] The first calculation submodule is used to determine the standard image after pose correction of the image to be processed, based on the following first loss function:

[0103]

[0104] in, This refers to the x-coordinate corresponding to the original coordinates after attitude correction. The ordinate is the ordinate corresponding to the original coordinates after attitude correction.

[0105] Optionally, the salient region extraction submodule includes:

[0106] The second calculation submodule is used to restrict the plurality of salient regions from overlapping based on the following second loss function:

[0107]

[0108] Among them, c i and c j Let σ represent the centers of the i-th and j-th salient regions, respectively, where K represents the number of salient regions. 2 These are preset parameters.

[0109] Optionally, the salient region extraction submodule includes:

[0110] The third calculation submodule is used to ensure that the salient region corresponds to a local part of the part using the following third loss function:

[0111]

[0112] Where s(i) represents the significance value of point i in the significant region, c represents the center of the significant region, and p i This indicates the position of point i.

[0113] Optionally, the device further includes:

[0114] The second sample acquisition module is used to acquire a second sample dataset, which includes: part images with grade labels;

[0115] The second training module inputs the second sample dataset into the hierarchical classification network and trains the hierarchical classification network.

[0116] Optionally, the second training module includes:

[0117] The second training submodule is used to train the hierarchical classification network according to the following fourth loss function:

[0118] L total =L cls +αL overlap +βL outer +γL inner

[0119] Wherein, the L cls Let L be the cross-entropy loss function. overlap Let L be the first loss function. outer For the second loss function, L inner α is the third loss function, and β and γ are the weight coefficients of the loss function.

[0120] In summary, in this embodiment, after obtaining the image to be processed containing the part, the image is input into a part pose correction network to correct the pose of the part in the image and output a standard image of the part after pose correction. Based on the obtained standard image, multiple salient features corresponding to the part are obtained. After fusing the multiple salient features, the network is input into a preset classification network to output the level corresponding to the part. By designing the pose correction network, the part image in the input image can be an image from any angle, without being limited to a specific angle. This allows part images from any angle to be corrected into standard images by the pose correction network before subsequent processing. By recognizing the standard image, multiple salient features corresponding to the part are obtained. These salient features can represent the detailed features of the part. By analyzing the fusion of the salient features of the part, the level corresponding to the part is output. This makes the solution of this application applicable to various types of part classification and saves costs.

[0121] Figure 6 A block diagram of an electronic device 600 is shown according to an exemplary embodiment. For example, the electronic device 600 may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.

[0122] Reference Figure 6The sub-device 600 may include one or more of the following components: processing component 602, memory 604, power supply component 606, multimedia component 608, audio component 610, input / output (I / O) interface 612, sensor component 614, and communication component 616.

[0123] Processing component 602 typically controls the overall operation of electronic device 600, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 602 may include one or more modules to facilitate interaction between processing component 602 and other components. For example, processing component 602 may include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.

[0124] Memory 604 is used to store various types of data to support the operation of electronic device 600. Examples of such data include instructions for any application or method operating on electronic device 600, contact data, phonebook data, messages, pictures, multimedia, etc. Memory 604 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0125] Power supply component 606 provides power to various components of electronic device 600. Power supply component 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 600.

[0126] Multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of touch or swipe actions but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 608 includes a front-facing camera and / or a rear-facing camera. When the electronic device 600 is in an operating mode, such as a shooting mode or a multimedia mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0127] Audio component 610 is used to output and / or input audio signals. For example, audio component 610 includes a microphone (MIC) used to receive external audio signals when electronic device 600 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 604 or transmitted via communication component 616. In some embodiments, audio component 610 also includes a speaker for outputting audio signals.

[0128] I / O interface 612 provides an interface between processing component 602 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0129] Sensor assembly 614 includes one or more sensors for providing state assessments of various aspects of electronic device 600. For example, sensor assembly 614 can detect the on / off state of electronic device 600, the relative positioning of components such as the display and keypad of electronic device 600, changes in position of electronic device 600 or a component of electronic device 600, the presence or absence of user contact with electronic device 600, orientation or acceleration / deceleration of electronic device 600, and temperature changes of electronic device 600. Sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 614 may also include an accelerometer, gyroscope, magnetometer, pressure sensor, or temperature sensor.

[0130] Communication component 616 facilitates wired or wireless communication between electronic device 600 and other devices. Electronic device 600 can access wireless networks based on communication standards, such as WiFi, carrier networks (such as 2G, 3G, 4G, or 5G), or combinations thereof. In one exemplary embodiment, communication component 616 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 616 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0131] In an exemplary embodiment, the electronic device 600 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to implement the image classification method provided in the embodiments of this application.

[0132] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 604 including instructions, which can be executed by a processor 620 of an electronic device 600 to perform the above-described method. For example, the non-transitory storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0133] Figure 7 A block diagram of an electronic device 700 is shown according to an exemplary embodiment. For example, the electronic device 700 may be provided as a server. (Refer to...) Figure 7 The electronic device 700 includes a processing component 722, which further includes one or more processors, and memory resources represented by a memory 732 for storing instructions, such as application programs, that can be executed by the processing component 722. The application programs stored in the memory 732 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 722 is configured to execute instructions to perform an image classification method provided in embodiments of this application.

[0134] Electronic device 700 may also include a power supply component 726 configured to perform power management of electronic device 700, a wired or wireless network interface 750 configured to connect electronic device 700 to a network, and an input / output (I / O) interface 758. Electronic device 700 may operate on an operating system stored in memory 732, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, or similar.

[0135] Other embodiments of this application will readily occur to those skilled in the art upon consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this application are indicated by the following claims.

[0136] It should be understood that this application is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this application is limited only by the appended claims.

Claims

1. An image classification method, characterized by, The method includes: Acquire an image to be processed, wherein the image to be processed is a captured image of the part; The image to be processed is input into a preset part posture correction network to perform posture correction on the parts in the image to be processed and output a standard image of the parts after posture correction. Based on the standard image, several salient features corresponding to the part are obtained; After fusing the multiple salient features, the data is input into a preset classification network to output the corresponding level of the part. The step of inputting the image to be processed into a preset part pose correction network to perform pose correction on the parts in the image to be processed includes: Obtain the coordinates of the four corner points of the image to be processed, and the coordinates of the four sides of the image to be processed, respectively, as the original coordinates; Based on the following first loss function, determine the standard image after pose correction of the image to be processed: wherein, is the horizontal coordinate corresponding to the original coordinate after the attitude correction, is the vertical coordinate corresponding to the original coordinate after the attitude correction.

2. The method of claim 1, wherein, The step of obtaining multiple salient features corresponding to the part based on the standard image includes: The standard image is input into a feature extraction network to obtain the general features corresponding to the part. The standard image is input into a multiple salient region modeling network to obtain multiple salient regions corresponding to the part; The multiple salient regions are fused with the general features respectively to obtain the corresponding multiple salient features.

3. The method of claim 1, wherein, Before inputting the image to be processed into a preset part pose correction network to perform pose correction on the parts in the image to be processed, the method further includes: Obtain a first sample dataset, which includes part pose images from different angles; The first sample dataset is input into the part pose correction network to train the part pose correction network.

4. The method of claim 2, wherein, The step of inputting the standard image into a multiple salient region modeling network to obtain multiple salient regions corresponding to the part includes: According to the following second loss function, the mutual non-overlapping between the plurality of salient regions is limited: wherein, and Ci and Cj represent the center of the i-th and j-th salient region, respectively, and K represents the number of salient regions, is a preset parameter.

5. The method of claim 2, wherein, The step of inputting the standard image into a multiple salient region modeling network to obtain multiple salient regions corresponding to the part includes: By the following third loss function, the salient region corresponds to the local part of the part: wherein, c denotes the center of the salient region, denotes the position of the point.

6. The method of claim 1, wherein, The method further includes: Obtain a second sample dataset, which includes: part images with grade labels; The second sample dataset is input into the hierarchical classification network to train the hierarchical classification network.

7. The method of claim 6, wherein, The step of inputting the second sample dataset into the hierarchical classification network and training the hierarchical classification network includes: The hierarchical classification network is trained according to the following fourth loss function: Wherein, the is a cross-entropy loss function, is a first loss function, is a second loss function, is a third loss function, and α, β and γ are weight coefficients of the loss functions.

8. An image classification apparatus characterized by comprising: The device includes: The first acquisition module is used to acquire an image to be processed, wherein the image to be processed is a photographed image of a part; The image correction module is used to input the image to be processed into a preset part posture correction network, perform posture correction on the part in the image to be processed, and output a standard image of the part after posture correction. The second acquisition module is used to obtain multiple salient features corresponding to the part based on the standard image; The grading module is used to fuse the multiple salient features, input them into a preset grading network, and output the grade corresponding to the part. The image correction module includes: The coordinate acquisition submodule is configured to acquire coordinates of four corner points of the to-be-processed image and four coordinates respectively located on four edges of the to-be-processed image as original coordinates. The first calculation submodule is configured to determine a standard image after pose correction of the to-be-processed image according to a first loss function as follows: wherein, is the horizontal coordinate corresponding to the original coordinate after the attitude correction, is the vertical coordinate corresponding to the original coordinate after the attitude correction.

9. The apparatus of claim 8, wherein, The second acquisition module comprises: The general feature extraction submodule is configured to input the standard image into a feature extraction network to obtain general features corresponding to the part. The salient region extraction submodule is configured to input the standard image into a plurality of salient region modeling networks to obtain a plurality of salient regions corresponding to the part. The fusion submodule is configured to fuse the plurality of salient regions with the general features respectively to obtain a plurality of corresponding salient features.

10. The apparatus of claim 9, wherein, The salient region extraction submodule comprises: a second calculating sub-module, configured to limit the mutual non-overlapping between the plurality of salient regions according to a second loss function as follows: wherein, and Ci and Cj represent the center of the i-th and j-th salient region, respectively, and K represents the number of salient regions, is a preset parameter.

11. The apparatus of claim 9, wherein, The salient region extraction submodule comprises: a third calculating sub-module, configured to make the salient region correspond to the local part of the part by a third loss function: wherein, , c denotes the center of the salient region, denotes the position of the point.

12. An electronic device, comprising: The device comprises a processor and a memory, the memory stores programs or instructions executable on the processor, and the programs or instructions are executed by the processor to implement the steps of the image classification method in any one of claims 1 to 7.

13. A readable storage medium, characterized by, The readable storage medium stores programs or instructions, and the programs or instructions are executed by the processor to implement the steps of the image classification method in any one of claims 1 to 7.