Method for searching goods based on image and electronic device
By performing image subject recognition and cropping on the client side, the problem of inaccurate search results caused by image compression is solved, achieving resource savings on the server side and improved result accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG TMALL TECH CO LTD
- Filing Date
- 2023-05-29
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, image-based product search methods suffer from decreased search result accuracy due to image compression, and server-side entity recognition consumes excessive computing resources.
The client performs image subject recognition and cropping, and only uploads the main image content to the server for searching, reducing image compression and using dynamic algorithm models to improve accuracy.
It saves server-side computing resources, reduces image information loss, and improves the accuracy of product search results.
Smart Images

Figure CN116756168B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image-based information search technology, and in particular to methods and electronic devices for image-based product search. Background Technology
[0002] Product information service systems typically provide users with a search function. Traditionally, this is done by entering keywords; users describe the products they want to search for using keywords, and the system then returns search results that match their criteria.
[0003] To meet users' needs in more search scenarios, some systems also offer the "search for products by image" function. With this function, users can take a picture of the product they want to buy, or obtain relevant images through other means, and use them as search criteria to initiate a product search. The system can then search for matching product search results by matching the user's input image with the product image and return the results to the user.
[0004] In the aforementioned image-based product search method, existing technologies typically involve the client performing subject recognition on user-submitted images or captured images. Once the target subject is identified, the image or current image frame is uploaded. To conserve transmission resources and reduce user waiting time, the image is usually compressed before uploading. The server, upon receiving the image, performs subject recognition and then provides product search results based on the identified subject. However, this method results in corrupted images received by the server, affecting the accuracy of search results, especially when the subject portion of the image is small. Compression further blurs this portion, making it even harder to guarantee the accuracy of the search results. Furthermore, since the server needs to provide data services to numerous clients, the process of subject recognition and cropping on the server side consumes significant computing resources. Summary of the Invention
[0005] This application provides a method and electronic device for image-based product search, which can save server-side computing resources and improve the accuracy of product search results.
[0006] This application provides the following solution:
[0007] A method for image-based product search, the method being applied to a client, includes:
[0008] In response to a request to perform a product search based on a target image, subject recognition is performed on the target image;
[0009] Extract the main image content of the target subject from the target image;
[0010] Based on the extracted main image content, a product search request is submitted to the server, so that the server can provide product search results based on the main image portion.
[0011] The step of submitting a product search request to the server based on the extracted main image content includes:
[0012] After the main image content is compressed, it is submitted to the server so that the server can provide product search results based on the compressed main image content; wherein, when the main image content is compressed, its compression rate is less than the compression rate of the target image when the entire target image is uploaded to the server.
[0013] This also includes:
[0014] If multiple subjects are identified from the target image, the confidence level corresponding to each of the multiple subjects is determined, and the confidence level is used to characterize the probability that the corresponding subject matches the user's search intent;
[0015] Among the multiple entities, the entity whose confidence level meets the criteria is identified as the target entity.
[0016] This also includes:
[0017] During the process of displaying the product search results corresponding to the target subject, a tag option is provided for switching to other subjects among the multiple subjects for searching;
[0018] In response to a switching operation initiated via the tag option, the subject image content corresponding to the other subject extracted from the target image is submitted to the server so that the server can provide product search results based on the subject image content corresponding to the other subject.
[0019] The step of performing subject recognition on the target image includes:
[0020] The target image is used to identify the subject by utilizing a target algorithm model that is compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition.
[0021] The target algorithm model is stored and updated by the server; the client supports dynamic distribution of the algorithm model.
[0022] The subject recognition of the target image includes:
[0023] The target algorithm model is retrieved from the server, and subject recognition is performed on the target image.
[0024] The method further includes a step of bucketing tests on multiple versions of the algorithm model, wherein the bucketing test process includes:
[0025] Upon receiving a user's request to search for products based on a target image, the system determines the target bucket identifier that the user's identifier matches.
[0026] The server retrieves the algorithm model corresponding to the target bucket identifier to the local terminal device so that the target image can be identified based on this version of the algorithm model. After extracting the content of the main image, the server sends a product search request to the server so that the server can select the target algorithm model by comparing the accuracy of the main image identification results of multiple different versions of the algorithm model.
[0027] A method for image-based product search, the method being applied to a client, includes:
[0028] In response to a request to perform a product search based on a target image, an image stream is acquired through the camera component of the client-associated terminal device;
[0029] Subject identification is performed from the image stream acquired by the camera component;
[0030] If a target subject is identified, the main image content of the target subject is extracted from the current image frame of the image stream;
[0031] Based on the extracted main image content, a product search request is submitted to the server so that the server can provide product search results based on the main image portion.
[0032] The search results for products provided by the server are displayed.
[0033] This also includes:
[0034] If a target subject is identified, a bounding box marker for the target subject is added to the current image frame, along with an operation option for confirming the subject identification result;
[0035] Extracting the main image content of the target subject from the acquired current image frame includes:
[0036] If the user confirms the subject recognition result through the operation option, the subject image content of the target subject is extracted from the current image frame that has been acquired.
[0037] A method for image-based product search, the method being applied on a server, includes:
[0038] The system receives a request from a client to perform a product search based on an image. The request includes a main image content, which is obtained by the client performing subject recognition on the target image input by the user and then cropping it from the target image.
[0039] Product search results are generated based on the content of the main image and returned to the client.
[0040] An algorithm model processing method, comprising:
[0041] Multiple versions of the algorithm model are provided, along with a test scheme for bucket testing of the multiple versions of the algorithm model. The algorithm model is adapted to the data processing capabilities of the client's terminal device and is used for subject recognition in images.
[0042] In the process of providing image-based product search services, in response to the client's request to retrieve the algorithm model, the corresponding bucket identifier is determined, and the algorithm model corresponding to the bucket identifier is provided to the client, so that the client can use the corresponding version of the algorithm model to perform subject recognition on the target image input by the user, and after extracting the subject image content from the target image, initiate a product search request to the server.
[0043] In response to the product search request submitted by the client, provide product search results based on the main image content, and add corresponding bucket identifiers to the product search results;
[0044] Acquire and record user behavior data generated by users in response to the product search results;
[0045] Using bucket identifiers as units, user behavior data for product search results corresponding to each bucket identifier are statistically analyzed. Based on the statistical results, the subject identification accuracy of the algorithm model version corresponding to each bucket identifier is evaluated, and the algorithm model versions are selected based on the evaluation results.
[0046] An image-based product search device, the device being used on a client side, comprising:
[0047] The subject recognition unit is used to perform subject recognition on the target image in response to a request for product search based on the target image;
[0048] The main image cropping unit is used to crop the main image content of the target subject from the target image;
[0049] The request submission unit is used to submit a product search request to the server based on the extracted main image content, so that the server can provide product search results based on the main image portion.
[0050] An image-based product search device, the device being used on a client side, comprising:
[0051] An image acquisition unit is used to acquire an image stream through the camera component of the client-associated terminal device in response to a request for product search based on a target image.
[0052] The subject recognition unit is used to perform subject recognition from the image stream acquired by the camera component;
[0053] The main image cropping unit is used to crop the main image content of the target subject from the current image frame of the image stream if the target subject is identified.
[0054] The request submission unit is used to submit a product search request to the server based on the extracted main image content, so that the server can provide product search results based on the main image portion.
[0055] The product search results display unit is used to display the product search results provided by the server.
[0056] An image-based product search device, the device being used on a server, comprising:
[0057] The request receiving unit is used to receive a request submitted by a client for image-based product search. The request includes main image content, wherein the main image content is obtained by the client performing subject recognition on the target image input by the user and then cropping it from the target image.
[0058] The product search result providing unit is used to generate product search results based on the main image content and return them to the client.
[0059] A computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of any of the preceding methods.
[0060] An electronic device, comprising:
[0061] One or more processors; and
[0062] A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of the method described in any of the preceding descriptions.
[0063] According to the specific embodiments provided in this application, the following technical effects are disclosed:
[0064] Through the embodiments of this application, when a user needs to search for products based on a target image, the target image can be identified on the client side. Then, the main image content can be extracted from the target image, and a product search request can be initiated to the server based on this main image content. This allows the server to avoid repeatedly performing subject identification and extraction, and instead directly utilize the main image content uploaded by the client to provide product search results, thus saving server computing resources. Furthermore, since the image uploaded from the client to the server only needs to include the main image content, the amount of data transmitted can be reduced. The main image content can be uploaded without compression or with only a low compression rate, thereby reducing image information compression loss and improving the accuracy of product search results.
[0065] Regarding the subject recognition algorithm used on the client side, to achieve rapid iterative deployment, the original algorithm model used on the server side can be simplified to obtain an algorithm model that is both compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition. Furthermore, dynamic algorithm model distribution technology can be used to dynamically distribute multiple versions of the algorithm model to the client without updating the client version. This allows for online bucket testing of multiple different algorithm model versions, facilitating the selection of the superior algorithm model version.
[0066] Of course, any product implementing this application does not necessarily need to achieve all of the advantages described above at the same time. Attached Figure Description
[0067] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0068] Figure 1 This is a schematic diagram of the system architecture provided in the embodiments of this application;
[0069] Figure 2 This is a flowchart of the first method on the client side provided in the embodiments of this application;
[0070] Figure 3 This is a schematic diagram of the interface provided in an embodiment of this application;
[0071] Figure 4This is a flowchart of the second method on the client side provided in the embodiments of this application;
[0072] Figure 5 This is a flowchart of the server-side method provided in the embodiments of this application;
[0073] Figure 6 This is a flowchart of a method for selecting a subject recognition algorithm on the server side, provided in an embodiment of this application.
[0074] Figure 7 This is a schematic diagram of the first device on the client side provided in the embodiments of this application;
[0075] Figure 8 This is a schematic diagram of the second device on the client side provided in the embodiments of this application;
[0076] Figure 9 This is a schematic diagram of the server-side device provided in an embodiment of this application;
[0077] Figure 10 This is a schematic diagram of the electronic device provided in the embodiments of this application. Detailed Implementation
[0078] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.
[0079] In this embodiment, a solution is provided to improve the accuracy of image search while saving server computing resources. In this solution, considering that the client also has the ability to perform subject recognition on images, the server can directly trust the client's subject recognition results. That is, after the client performs subject recognition on the user-input image, it no longer merely triggers the search process but can extract the subject image content from the target image based on the subject recognition results and submit this subject image content to the server to initiate a server-side search. Thus, upon receiving a search request, the server can directly perform a product search based on the subject image content without needing to perform subject recognition and subject image extraction again on the server, thereby saving server computing resources. Furthermore, since the client only needs to upload the subject image content instead of the entire image, less transmission resources are required. Submission to the server can be achieved without compression or with a lower compression rate, avoiding image damage caused by high compression rates. This allows the server to perform product searches based on clearer subject image content, thus improving the accuracy of product search results.
[0080] In cases where multiple subjects are identified in the target image, the client can determine the confidence score of each subject. This confidence score characterizes the probability that a specific subject matches the user's search intent. Then, the subject whose confidence score meets certain criteria (e.g., the highest confidence score) can be selected as the target subject, and its image content can be extracted and submitted to the server for product search. Specifically, when determining the confidence score of each subject, quantitative calculations can be performed based on factors such as whether the subject is centered in the target image and the area proportion of the subject image. The results of these quantitative calculations can then be used as the confidence score for each subject, and so on.
[0081] Of course, while displaying the product search results for the target subject, a tag option can also be provided to switch to other subjects for searching, allowing users to initiate product searches for other subjects. At this time, the client can upload the subject image content corresponding to other subjects to the server for product search, and so on.
[0082] In practical implementation, the algorithm model on the client side can be improved to enhance the accuracy of subject recognition while adapting to the data processing capabilities of user terminal devices. This results in more reliable subject recognition results on the client side, allowing the server to provide more accurate product search results based on these more reliable results. This will be discussed in detail later.
[0083] From a system architecture perspective, see Figure 1 This application embodiment may involve the client and server of a product information service system. A user can initiate a product search request based on an image through the client, inputting a specific target image (including selecting from a local photo album or capturing an image using a camera component). Then, in this embodiment, the client can perform subject recognition from the target image and extract the subject image content, initiating a product search request to the server based on this content. The server can then directly provide product search results based on the received subject image content, eliminating the need for server-side subject recognition, location, and extraction processing.
[0084] The specific implementation schemes provided in the embodiments of this application will be described in detail below.
[0085] Example 1
[0086] First, this first embodiment provides a method for image-based product search from the client's perspective. This method can be applied to the application client of a product information service system. See [link to relevant documentation]. Figure 2 The method may include:
[0087] S201: In response to a request to perform a product search based on a target image, perform subject recognition on the target image.
[0088] In practice, an entry point for image-based product search can usually be provided on the client's homepage or other pages. After clicking through this entry point, users can select an image (photo, etc.) from their local album or start the camera of their terminal device to capture an image stream. Accordingly, the client can receive the user's request to search for products based on the target image.
[0089] Upon receiving the aforementioned request, subject recognition can be performed on the client side. The subject, typically, is the main object of interest in the target image, a major component of the composition, the visual center of the observer's gaze, or the primary element representing the content of the image. It is also usually the entry point for the observer to understand the image's content. Generally, the subject can be a single object or a group of objects; it can be a person, an object, or even an abstract object. For example, a user sees someone carrying a backpack while shopping and wants to search for the same model. They can take a photo of the backpack. The photo will include the backpack itself, and will inevitably capture some background content, including images of people and street scenes. However, since the focus is primarily on the backpack, the image of the "backpack" will usually become the main element of the generated image; that is, the "backpack" is the subject of the image.
[0090] Specifically, when performing subject recognition from a target image, it can be implemented using a relevant subject recognition algorithm model. Regarding the subject recognition algorithm, in this embodiment, since it needs to be executed on the client side, and the server needs to provide product search results based on the subject recognition results of the algorithm model, it can specifically be an algorithm model that is compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition. The implementation of the specific algorithm model will be described in detail later.
[0091] It should be noted that if the target image is a photo selected by the user from their local album, subject recognition can be performed directly on that photo. If the target image is an image stream captured in real time by the camera component of the terminal device—that is, after initiating an image-based search, the user is directed to the camera page—then a client-side subject recognition algorithm can be used to detect the subject in each frame of image data captured by the camera component. If a subject is identified in a certain image frame, the detection can be stopped, and subsequent cropping of the subject image content can be performed based on that image frame.
[0092] S202: Extract the main image content of the target subject from the target image.
[0093] After the client completes subject recognition, the location of the target subject in the target image can be determined. Then, based on the identified location of the target subject, the subject image content can be extracted from the target image. For example, in a specific implementation, the coordinates of a prominent subject in the image can be detected, and the subject image content can be extracted based on these coordinates. Specifically, the extracted subject image can be a rectangular area, or it can be an area within the outline of the subject image, and so on. In summary, in this embodiment, not only can the search process be triggered by subject recognition, but the client can also perform "image cutout" processing on the subject image so that the server can provide product search results based on the extracted subject image content.
[0094] S203: Submit a product search request to the server based on the extracted main image content, so that the server can provide product search results based on the main image portion.
[0095] After extracting the main image content, a product search request can be submitted to the server based on the extracted main image content. In this way, the server can directly provide product search results based on this main image content without needing to perform main image recognition, cropping, or other processing on the server side.
[0096] In specific implementation, since the image to be uploaded to the server is a portion of the original target image, its data size is much smaller than that of the original target image. Therefore, compression is unnecessary, and it can be directly uploaded to the server to avoid image information loss. Alternatively, the extracted main image content can be compressed before being submitted to the server, allowing the server to provide product search results based on the compressed main image content. Of course, the compression rate of this main image content can be lower than the compression rate of the entire target image when uploaded to the server. In other words, compared to the existing technology of compressing the entire image before uploading it to the server, compressing the main image portion at a lower compression rate can reduce the loss of image information and improve the accuracy of the product search results provided by the server.
[0097] It should be noted that in practical implementation, there may be situations where multiple subjects are identified from the same target image. For example, a photo might show a person wearing a down jacket. When identifying subjects in this photo, we can identify the down jacket, and we might also identify a pair of shoes worn by the person, and so on. In this case, the client can also determine the confidence level for each of the multiple subjects. This confidence level can be used to characterize the probability that the corresponding subject matches the user's search intent. That is, it can determine which subject the user is more likely to search for related products, and then identify the subject whose confidence level meets the criteria as the target subject. For example, in the aforementioned example, suppose the down jacket occupies a larger area in the photo and its image is more prominent. Although the shoes are also in the image, their proportion is relatively small. The user is more likely to search for products related to the down jacket. Therefore, the confidence level for the subject "down jacket" is higher. In this case, this subject can be identified as the target subject, and its image content can be uploaded to the server for product search.
[0098] Of course, in a specific implementation, during the process of displaying the product search results corresponding to the target subject, tag options can also be provided for switching to other subjects among the multiple subjects for searching. For example, such as Figure 3 As shown at point 31, a tag option for switching subjects can be provided on the page displaying product search results for the current target subject. Users can click on this tag option to view product search results for other subjects. Specifically, if a user initiates a switching operation through the tag option, the subject image content corresponding to the other subject extracted from the target image can be submitted to the server, so that the server can provide product search results based on the subject image content corresponding to the other subject. For example, in the aforementioned example, a down jacket and a pair of shoes were identified from the target image. Since the client algorithm determines that the down jacket has a higher confidence level, by default, product search results related to "down jacket" can be prioritized. At the same time, a tag option for switching to "shoes" can be provided at the top of the product search results page, etc. Users can click on this option to view product search results for "shoes," and so on.
[0099] In summary, in this embodiment, image subject recognition is performed on the client side, and the server directly trusts the client's subject recognition results and provides product search results, eliminating the need for the server to perform sufficient subject recognition and other processing. Therefore, this embodiment belongs to a "device-based intelligent" product search solution, where "device" refers to the client side, meaning that the "intelligence" of subject recognition on the client side is required to be higher.
[0100] Regarding the implementation of "edge intelligence" mentioned above, it's important to note that in existing solutions, both the client and server need to perform subject recognition. That is, both the client and server can deploy subject recognition algorithms, enabling both ends to recognize subjects. However, because subject recognition algorithms involve image processing, their complexity can be quite high. The algorithm's operation depends on the data processing capabilities of the hardware devices, and the data processing capabilities of user-side terminal devices are relatively weak compared to servers. Therefore, the algorithm models deployed on the client side are usually relatively small-scale. On the other hand, in existing technologies, the purpose of client-side subject recognition is primarily to trigger the search process. Afterward, the entire image is compressed and uploaded to the server, where the server re-recognizes the target image, extracts the relevant data, and then provides the corresponding product search results. Therefore, the accuracy requirements for client-side subject recognition are relatively low, which is another reason why clients typically deploy relatively small-scale algorithm models.
[0101] However, in this embodiment, since the server directly trusts the subject identification and location results on the client side and provides product search results based on these results, this implementation scheme, which directly relies on the subject identification results of the client to provide product search results, has higher requirements for the accuracy of subject identification and location on the client side, compared to simply using it to trigger the search process.
[0102] To achieve the above objectives, if the existing algorithm model deployed on the client side cannot meet the requirements for subject recognition accuracy, the algorithm model used on the client side can be improved during implementation. However, for image processing algorithm models, iterative training can be a very time-consuming and labor-intensive process. Therefore, to achieve rapid iteration and deployment of the algorithm, this embodiment can simplify the existing algorithm model on the server side. That is, since the existing algorithm model on the server side is usually a model with a relatively large parameter scale, although such a model can guarantee the accuracy of subject recognition, it typically requires high computing power from hardware devices. If directly deployed to the client side, the computing power of users' mobile devices such as smartphones may be insufficient. Therefore, the existing algorithm model on the server side can be simplified and then applied to the subject recognition process on the client side, thereby achieving a balance between the requirements for hardware computing power and the accuracy of subject recognition.
[0103] Of course, there are various ways to simplify the algorithm model, such as reducing some parameters or, in the case of a neural network model, reducing the number of layers. However, it is still necessary to verify which simplification method yields better results. To achieve this, this embodiment can be implemented using bucket testing. That is, multiple simplified versions of the algorithm model can be provided based on different simplification methods. Then, to further achieve rapid iteration, online data can be used to test the accuracy of each version of the algorithm model. In other words, during the actual image-based product search initiated by different users, the client can use different versions of the algorithm model for subject recognition, and the server provides product search results based on the client's subject recognition results. Then, by comparing the accuracy of subject recognition obtained using different versions of the algorithm model, the more preferred version can be selected as the final algorithm model.
[0104] In the bucketing test process, different search requests require different algorithm models for subject identification. Traditionally, if a specific algorithm model is directly deployed to the client side, different versions of the algorithm model require different client versions; that is, updating the algorithm model necessitates updating the client version. Clearly, this approach is unsuitable for the online bucketing test requirements. Therefore, in this embodiment, a dynamic algorithm model distribution technology can be used, allowing the client to retrieve different versions of the algorithm model from the server without needing to reissue the client version. In other words, to achieve online bucketing testing of multiple versions of the algorithm model, these multiple versions can be stored on the server. After a user initiates a search request, a hash calculation is performed based on the user's ID or other user identifiers. The hash value determines which bucket is matched. Then, the client can retrieve the corresponding version of the algorithm model from the server and use that version to perform subject identification on the user-input target image. The subject image content can then be extracted, and a product search request can be sent to the server. The server can then provide product search results based on the subject image content. In this process, since different requests may use different versions of algorithms for subject identification, the server can compare the accuracy of subject identification results of multiple different versions of algorithm models, select the more preferred target algorithm model, and finally use this algorithm model as the subject identification algorithm model for the client.
[0105] Specifically, when comparing the accuracy of subject recognition results of different versions of the algorithm model, the server can do so in several ways. For example, one approach is to provide product search results based on the subject image content submitted by the client, and then statistically analyze metrics such as click-through rate and purchase conversion rate of the specific product search results. If a certain version of the algorithm model provides subject recognition results and the specific product search results generally generate a high click-through rate or purchase conversion rate, it can be inferred that the subject recognition accuracy of that algorithm model is relatively high, and so on.
[0106] Using the above method, the better performing algorithm model can be selected as the target algorithm model. This target algorithm model can then be stored on the server. The server can also update the algorithm model by continuing to learn from the model offline. Correspondingly, since the client supports the dynamic distribution of the algorithm model, the client can perform subject recognition on the target image input by the specific user by pulling the target algorithm model from the server.
[0107] In summary, through the embodiments of this application, when a user needs to search for products based on a target image, the target image can be identified on the client side. Then, the main image content can be extracted from the target image, and a product search request can be initiated to the server based on this main image content. This allows the server to avoid repeatedly performing subject identification and extraction, and instead directly utilize the main image content uploaded by the client to provide product search results, thus saving server computing resources. Furthermore, since the image uploaded from the client to the server only needs to include the main image content, the amount of data transmitted can be reduced. The main image content can be uploaded without compression or with only a low compression rate, thereby reducing image information compression loss and improving the accuracy of product search results.
[0108] Regarding the subject recognition algorithm used on the client side, to achieve rapid iterative deployment, the original algorithm model used on the server side can be simplified to obtain an algorithm model that is both compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition. Furthermore, dynamic algorithm model distribution technology can be used to dynamically distribute multiple versions of the algorithm model to the client without updating the client version. This allows for online bucket testing of multiple different algorithm model versions, facilitating the selection of the superior algorithm model version.
[0109] Example 2
[0110] This second embodiment, also from the client's perspective, describes a scheme where an image stream is captured in real time on a terminal device, and subject recognition is performed from the image stream. The server then uses the subject recognition results from the client to identify the product. For details, see [link to documentation]. Figure 4 Embodiment 2 of this application provides a method for product search based on images. The method is applied to a client and may specifically include:
[0111] S401: In response to a request to perform a product search based on a target image, an image stream is acquired through the camera component of the client-associated terminal device.
[0112] S402: Perform subject recognition from the image stream acquired by the camera component.
[0113] S403: If a target subject is identified, the subject image content of the target subject is extracted from the current image frame of the image stream.
[0114] Specifically, the client can perform subject recognition on each frame of the image stream. A subject is considered identified when it meets certain criteria in a particular image frame. These criteria might include the subject occupying a sufficiently large portion of the image, the image being sufficiently clear, and so on. Once the target subject is identified, its image content can be extracted from that current image frame.
[0115] In specific implementation, since the server directly trusts the client's subject recognition and subject image cropping results for product search, to further improve the accuracy of product search results, before the client crops the subject image based on the subject recognition results and uploads it to the server, it can interact with the user to confirm the client's subject recognition results. For example, after identifying the target subject from the current image frame, a bounding box marker for the target subject can be added to the current image frame, along with an option to confirm the subject recognition results. If the user confirms the subject recognition results through the operation option, the subject image content is cropped from the captured current image frame. Of course, an option to re-recognize the subject can also be provided. If the currently identified target subject is not the object the user actually wants to search for, the algorithm can be triggered to re-recognize it. During this process, the user can be prompted to adjust the shooting angle, posture, etc., so that the actual object is sufficiently prominent in the image, and so on. In this way, since the user confirms the subject recognition result of the algorithm, the confidence of the subject recognition result on the client side can be further improved, which in turn can enable the server to return product search results that truly meet the user's needs.
[0116] S404: Submit a product search request to the server based on the extracted main image content, so that the server can provide product search results based on the main image portion.
[0117] S405: Display the product search results provided by the server.
[0118] Example 3
[0119] This third embodiment, corresponding to the first embodiment, provides a method for image-based product search from the server-side perspective. (See [link to relevant documentation]). Figure 5 The method may include:
[0120] S501: Receive a request from the client to perform a product search based on an image. The request includes main image content, wherein the main image content is obtained by the client performing subject recognition on the target image input by the user and then cropping it from the target image.
[0121] S502: Generate product search results based on the main image content and return them to the client.
[0122] Example 4
[0123] This fourth embodiment describes the process of online bucketing testing of the algorithm model from the server-side perspective. Specifically, this fourth embodiment provides an algorithm model processing method, see [link to relevant documentation]. Figure 6 The method may include:
[0124] S601: Provides multiple versions of the algorithm model and a test scheme for bucketing the multiple versions of the algorithm model. The algorithm model is adapted to the data processing capabilities of the terminal device where the client is located and is used for subject recognition and localization of images.
[0125] S602: In the process of providing image-based product search service, in response to the client's request to pull the algorithm model, the corresponding bucket identifier is determined, and the algorithm model corresponding to the bucket identifier is provided to the client, so that the client can use the corresponding version of the algorithm model to perform subject recognition on the target image input by the user, and after extracting the subject image content from the target image, initiate a product search request to the server.
[0126] S603: In response to the product search request submitted by the client, provide product search results based on the main image content, and add corresponding bucket identifiers to the product search results;
[0127] S604: Acquire and record user behavior data generated by the user's interaction with the product search results;
[0128] S605: Using bucket identifiers as units, statistically analyze the user behavior data of the product search results corresponding to each bucket identifier, so as to evaluate the subject identification accuracy of the algorithm model version corresponding to each bucket identifier based on the statistical results, and select the multiple versions of the algorithm model based on the evaluation results.
[0129] For the parts of Embodiments 2, 3, and 4 that are not detailed, please refer to Embodiment 1 above and other parts of this specification. They will not be repeated here.
[0130] It should be noted that the embodiments of this application may involve the use of user data. In practical applications, user-specific personal data may be used in the scheme described herein within the scope permitted by applicable laws and regulations, provided that it complies with the applicable laws and regulations of the country (e.g., with the user's explicit consent, with the user being properly notified, etc.).
[0131] Corresponding to Embodiment 1, this application also provides an apparatus for image-based product search, which is applied to a client-side application. (See attached image for details.) Figure 7 The device may include:
[0132] The subject recognition unit 701 is used to perform subject recognition on the target image in response to a request for product search based on the target image;
[0133] The main image cropping unit 702 is used to crop the main image content of the target subject from the target image;
[0134] The request submission unit 703 is used to submit a product search request to the server based on the extracted main image content, so that the server can provide product search results based on the main image portion.
[0135] Specifically, the device may further include:
[0136] A compression processing unit is used to compress the main image content and submit it to the server so that the server can provide product search results based on the compressed main image content; wherein, when compressing the main image content, its compression rate is less than the compression rate of the target image when the entire target image is uploaded to the server.
[0137] The device may further include:
[0138] A confidence determination unit is used to determine the confidence level of each of the multiple subjects if multiple subjects are identified from the target image, wherein the confidence level is used to characterize the probability that the corresponding subject matches the user's search intent;
[0139] The target subject determination unit is used to determine the subject whose confidence level meets the conditions among the plurality of subjects as the target subject.
[0140] Additionally, the device may also include:
[0141] A tag option providing unit is used to provide tag options for switching to other subjects among the multiple subjects for searching during the process of displaying product search results corresponding to the target subject;
[0142] The search switching unit is used to respond to a switching operation initiated through the tag option and submit the subject image content corresponding to the other subject extracted from the target image to the server so that the server can provide product search results based on the subject image content corresponding to the other subject.
[0143] Specifically, the subject identification unit can be used for:
[0144] The target image is used to identify the subject by utilizing a target algorithm model that is compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition.
[0145] The target algorithm model is stored and updated by the server; the client supports dynamic distribution of the algorithm model.
[0146] At this time, the subject identification unit can be specifically used for:
[0147] The target algorithm model is retrieved from the server, and subject recognition is performed on the target image.
[0148] In addition, in practical implementation, multiple versions of the algorithm model can be tested using a bucketing method. The bucketing test process includes:
[0149] The target bucket identifier determination unit is used to determine the target bucket identifier that the user's identifier matches after receiving a request from the user to search for goods based on the target image.
[0150] The model retrieval unit is used to retrieve the algorithm model corresponding to the target bucket identifier from the server to the local terminal device, so as to perform subject recognition on the target image based on the algorithm model of that version, extract the subject image content, and then send a product search request to the server, so that the server can select the target algorithm model by comparing the accuracy of the subject recognition results of multiple different versions of the algorithm model.
[0151] Corresponding to Embodiment 2, this application also provides an apparatus for image-based product search, see [link to embodiment]. Figure 8 The device is applied to a client and includes:
[0152] Image acquisition unit 801 is used to acquire image streams through the camera component of the client-associated terminal device in response to a request for product search based on a target image;
[0153] Subject recognition unit 802 is used to perform subject recognition from the image stream acquired by the camera component;
[0154] The main image cropping unit 803 is used to crop the main image content of the target subject from the current image frame of the image stream if the target subject is identified.
[0155] The request submission unit 804 is used to submit a product search request to the server based on the extracted main image content, so that the server can provide product search results based on the main image portion.
[0156] The product search result display unit 805 is used to display the product search results provided by the server.
[0157] In a specific implementation, the device may further include:
[0158] The bounding box marking unit is used to add a bounding box marking about the target subject in the current image frame if the target subject is identified, and to provide operation options for confirming the subject identification result;
[0159] At this time, the main image cropping unit can be specifically used for:
[0160] If the user confirms the subject recognition result through the operation option, the subject image content of the target subject is extracted from the current image frame that has been acquired.
[0161] Corresponding to Embodiment 3, this application also provides an apparatus for image-based product search, which is applied to a server. See [link to related documentation]. Figure 9 The device may include:
[0162] The request receiving unit 901 is used to receive a request submitted by a client for searching products based on images. The request includes main image content, wherein the main image content is obtained by the client performing subject recognition on the target image input by the user and then cropping it from the target image.
[0163] The product search result providing unit 902 is used to generate product search results based on the main image content and return them to the client.
[0164] In addition, embodiments of this application also provide a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the method described in any of the foregoing method embodiments.
[0165] And an electronic device, comprising:
[0166] One or more processors; and
[0167] A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of the method described in any of the foregoing method embodiments.
[0168] in, Figure 10 The architecture of an electronic device is illustrated by example. For instance, device 1000 may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, aircraft, etc.
[0169] Reference Figure 10 The device 1000 may include one or more of the following components: a processing component 1002, a memory 1004, a power supply component 1006, a multimedia component 1008, an audio component 1010, an input / output (I / O) interface 1012, a sensor component 1014, and a communication component 1016.
[0170] Processing component 1002 typically controls the overall operation of device 1000, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 1002 may include one or more processors 1020 to execute instructions to perform all or part of the steps of the methods provided in this disclosure. Furthermore, processing component 1002 may include one or more modules to facilitate interaction between processing component 1002 and other components. For example, processing component 1002 may include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.
[0171] Memory 1004 is configured to store various types of data to support the operation of device 1000. Examples of this data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, etc. Memory 1004 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0172] Power supply component 1006 provides power to various components of device 1000. Power supply component 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 1000.
[0173] Multimedia component 1008 includes a screen that provides an output interface between device 1000 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of touch or swipe actions but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 1008 includes a front-facing camera and / or a rear-facing camera. When device 1000 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
[0174] Audio component 1010 is configured to output and / or input audio signals. For example, audio component 1010 includes a microphone (MIC) configured to receive external audio signals when device 1000 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 1004 or transmitted via communication component 1016. In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.
[0175] I / O interface 1012 provides an interface between processing component 1002 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.
[0176] Sensor assembly 1014 includes one or more sensors for providing state assessments of various aspects of device 1000. For example, sensor assembly 1014 may detect the on / off state of device 1000, the relative positioning of components such as the display and keypad of device 1000, changes in the position of device 1000 or a component of device 1000, the presence or absence of user contact with device 1000, the orientation or acceleration / deceleration of device 1000, and temperature changes of device 1000. Sensor assembly 1014 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 1014 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.
[0177] Communication component 1016 is configured to facilitate wired or wireless communication between device 1000 and other devices. Device 1000 can access wireless networks based on communication standards, such as WiFi, or mobile communication networks such as 2G, 3G, 4G / LTE, and 5G. In one exemplary embodiment, communication component 1016 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 1016 further includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
[0178] In an exemplary embodiment, device 1000 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.
[0179] In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 1004 including instructions, which can be executed by a processor 1020 of device 1000 to perform the method provided by the present disclosure. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
[0180] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this application.
[0181] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, for system or system embodiments, since they are basically similar to method embodiments, the description is relatively simple, and relevant parts can be referred to the descriptions in the method embodiments. The systems and system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0182] The foregoing has provided a detailed description of the image-based product search method and electronic device provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are merely for the purpose of helping to understand the method and its core ideas. Furthermore, those skilled in the art will recognize that, based on the ideas of this application, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A method for product search based on images, characterized in that, The method is applied to the client and includes: In response to a request to search for goods based on a target image, a target algorithm model that is compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition is used to perform subject recognition on the target image. The method further includes a step of bucket testing multiple versions of the algorithm model. During the bucket testing process, after receiving a user's request to search for goods based on a target image, the target bucket identifier matched by the user's identifier is determined. The algorithm model corresponding to the target bucket identifier is retrieved from the server to the local terminal device, so that subject recognition is performed on the target image based on this version of the algorithm model. After extracting the subject image content, a product search request is sent to the server, allowing the server to select the target algorithm model by comparing the accuracy of subject recognition results from multiple different versions of the algorithm model. If a target subject is identified, the main image content of the target subject is extracted from the target image; Based on the extracted main image content, a product search request is submitted to the server, so that the server can provide product search results based on the main image portion.
2. The method according to claim 1, characterized in that, The step of submitting a product search request to the server based on the extracted main image content includes: After the main image content is compressed, it is submitted to the server so that the server can provide product search results based on the compressed main image content; wherein, when the main image content is compressed, its compression rate is less than the compression rate of the target image when the entire target image is uploaded to the server.
3. The method according to claim 1, characterized in that, Also includes: If multiple subjects are identified from the target image, the confidence level corresponding to each of the multiple subjects is determined, and the confidence level is used to characterize the probability that the corresponding subject matches the user's search intent; Among the multiple entities, the entity whose confidence level meets the criteria is identified as the target entity.
4. The method according to claim 3, characterized in that, Also includes: During the process of displaying the product search results corresponding to the target subject, a tag option is provided for switching to other subjects among the multiple subjects for searching; In response to a switching operation initiated via the tag option, the subject image content corresponding to the other subject extracted from the target image is submitted to the server so that the server can provide product search results based on the subject image content corresponding to the other subject.
5. The method according to claim 1, characterized in that, The target algorithm model is stored and updated by the server; the client supports dynamic distribution of the algorithm model. The subject recognition of the target image includes: The target algorithm model is retrieved from the server, and subject recognition is performed on the target image.
6. A method for product search based on images, characterized in that, The method is applied to the client and includes: In response to a request to perform a product search based on a target image, an image stream is acquired through the camera component of the client-associated terminal device; Using a target algorithm model that is compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition, subject recognition is performed from the image stream acquired by the camera component. The method further includes a step of bucket testing multiple versions of the algorithm model. During the bucket testing process, after receiving a user's request to search for goods based on the target image, the target bucket identifier matched by the user's identifier is determined. The algorithm model corresponding to the target bucket identifier is retrieved from the server to the local terminal device, so that subject recognition is performed on the target image based on this version of the algorithm model. After extracting the subject image content, a product search request is sent to the server, allowing the server to select the target algorithm model by comparing the accuracy of subject recognition results from multiple different versions of the algorithm model. If a target subject is identified, the main image content of the target subject is extracted from the current image frame of the image stream; Based on the extracted main image content, a product search request is submitted to the server so that the server can provide product search results based on the main image portion. The search results for products provided by the server are displayed.
7. The method according to claim 6, characterized in that, Also includes: If a target subject is identified, a bounding box marker for the target subject is added to the current image frame, along with an operation option for confirming the subject identification result; Extracting the main image content of the target subject from the current image frame of the image stream includes: If a user confirms the subject recognition result through the operation option, the subject image content of the target subject is extracted from the current image frame of the image stream.
8. A method for product search based on images, characterized in that, The method is applied to the server side and includes: The method receives a request from a client to perform a product search based on an image. The request includes a main image content, which is extracted from the target image input by the client after performing subject recognition. The method utilizes a target algorithm model that is compatible with the data processing capabilities of the client's terminal device and meets the accuracy requirements for subject recognition. Prior to this, the method includes a step of bucket testing multiple versions of the algorithm model. During the bucket testing process, upon receiving the user's request to perform a product search based on the target image, the target bucket identifier matched by the user's identifier is determined. The algorithm model corresponding to the target bucket identifier is retrieved from the server to the local machine of the terminal device. Based on this version of the algorithm model, the target image is used to perform subject recognition, and after extracting the main image content, a product search request is sent to the server. The server then selects the target algorithm model by comparing the accuracy of subject recognition results from multiple different versions of the algorithm model. Product search results are generated based on the content of the main image and returned to the client.
9. A method for processing algorithmic models, characterized in that, include: Multiple versions of the algorithm model are provided, along with a test scheme for bucket testing of the multiple versions of the algorithm model. The algorithm model is adapted to the data processing capabilities of the client's terminal device and is used for subject recognition in images. In the process of providing image-based product search services, in response to the client's request to retrieve the algorithm model, the corresponding bucket identifier is determined, and the algorithm model corresponding to the bucket identifier is provided to the client, so that the client can use the corresponding version of the algorithm model to perform subject recognition on the target image input by the user, and after extracting the subject image content from the target image, initiate a product search request to the server. In response to the product search request submitted by the client, provide product search results based on the main image content, and add corresponding bucket identifiers to the product search results; Acquire and record user behavior data generated by users in response to the product search results; Using bucket identifiers as units, user behavior data for product search results corresponding to each bucket identifier are statistically analyzed. Based on the statistical results, the subject identification accuracy of the algorithm model version corresponding to each bucket identifier is evaluated, and the algorithm model versions are selected based on the evaluation results.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program implements the steps of the method described in any one of claims 1 to 9.
11. An electronic device, characterized in that, include: One or more processors; as well as A memory associated with the one or more processors, the memory being used to store program instructions that, when read and executed by the one or more processors, perform the steps of the method according to any one of claims 1 to 9.