Counterfeit article detection system
By collecting unstructured data from online marketplaces, using natural language processing and machine learning models to generate question sets and analyze images, the challenge of detecting counterfeit items in online marketplaces has been solved, achieving efficient identification and reduced distribution of counterfeit items.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- EBAY INC
- Filing Date
- 2021-07-07
- Publication Date
- 2026-06-30
AI Technical Summary
In online marketplaces, existing technologies struggle to effectively detect counterfeit items, especially in the absence of physical inspections and structured data, and third-party sellers of counterfeit items may manipulate descriptions to bypass detection.
By collecting unstructured data related to items, using natural language processing models to identify items and their features, generating a question set and ranking it based on counterfeit indicator weights, and combining machine learning models to analyze images to identify counterfeit items.
It improves the efficiency of counterfeit detection in online marketplaces, enabling rapid identification of counterfeit versions of new items, reducing the distribution of counterfeit goods, and providing a better consumer experience.
Smart Images

Figure CN116157817B_ABST
Abstract
Description
Background Technology
[0001] Detecting counterfeit goods can be challenging. As new methods for detecting counterfeits are adopted, counterfeit goods are modified to avoid detection. The result is a continuous pursuit of developing new methods capable of successfully detecting counterfeits.
[0002] It is advantageous to detect counterfeit goods before they enter the market. Detection at this stage helps protect downstream consumers who may intentionally or unintentionally acquire counterfeit goods. Summary of the Invention
[0003] At a high level, the aspects described in this paper relate to the detection of counterfeit items provided via a network (e.g., the Internet). To this end, counterfeit item detection systems collect item data related to the item from various sources, including web crawlers. Depending on the type of item data (video, audio, text data, etc.), speech-to-text software or natural language processing is applied. Using these processes, textual elements representing the linguistic context of the item, its characteristics, or the item data are identified.
[0004] Based on a set of language rules, questions are generated using items and item features. In some aspects, questions are generated when the language context is relevant to detecting counterfeit items. Some questions may include requests for images of items or item features. These questions are stored as a question set, where each question is associated with an item.
[0005] The counterfeit item detection system responds to an item list request received from a client device by providing a set of questions to the client device. The item list request is a request to provide items via a network (e.g., through an online marketplace or other online platform). The question set is based on a ranking of the questions, where the ranking is based on a counterfeit indicator weight associated with the answer to a question, which indicates the strength of the relevance between the answer and whether the item is likely a counterfeit. In some aspects, a chatbot is used to provide questions sequentially.
[0006] Answers are received for a set of questions. Based on these answers, the counterfeit item detection system determines whether an item is counterfeit. This can be done using probability values of counterfeit indication weights for combinations of these answers, or by analyzing the received images using a trained neural network. If an item is determined to be counterfeit, the item list request is rejected. In some respects, the question set is re-ranked based on the determination or indication that an item is counterfeit. Item images received during the item list process (also known as item list images) can be used to further train the neural network.
[0007] This summary is intended to present in a simplified form the selection of concepts further described in the Detailed Description section of this disclosure. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to help define the scope of the claimed subject matter. Additional objects, advantages, and novel features of the art will be set forth in part in the description which follows, and will become apparent in part to those skilled in the art upon examination of this disclosure or through practical study of the art. Attached Figure Description
[0008] The technology is described in detail below with reference to the accompanying drawings, in which:
[0009] Figure 1 This is a block diagram of an example operating environment suitable for using a counterfeit goods detection system, based on one aspect described in this article;
[0010] Figure 2 This is a block diagram of an example counterfeit goods detection system based on one aspect described in this article;
[0011] Figure 3 This is based on one aspect of the use described in this article. Figure 2 The counterfeit goods detection system provides an example ranking and selection chart of issues in the index;
[0012] Figure 4 This illustrates one aspect described herein. Figure 2 A diagram illustrating an example process performed by a counterfeit goods detection system;
[0013] Figures 5 to 8 It shows the use of Figure 2 A block diagram of an example method for detecting counterfeit goods using a counterfeit goods detection system; and
[0014] Figure 9 It is an example computing device suitable for implementing various aspects of the described technology, based on one aspect described herein. Detailed Implementation
[0015] Detecting counterfeit items presents unique challenges when selling goods online. Because there are no physical markets, conventional methods for inspecting individual items are often unavailable. Some online retailers can prevent unintentionally offering counterfeit items because they can establish long-term relationships with stable suppliers. Typically, as part of these relationships, retailers receive proof of authenticity that they can verify.
[0016] However, online marketplaces do not offer the same benefits as many online retailers. Online marketplaces facilitate transactions by providing a platform where third-party sellers can offer goods and services to consumers. While in many cases online marketplaces are not the actual sellers, some still actively seek to detect and remove counterfeit items. By doing so, online marketplaces can provide a better experience for consumers.
[0017] One of the challenges faced by online marketplaces attempting to detect counterfeit goods is that, in most cases, physical inspection of items is impossible. This is because a third-party seller coordinates direct delivery of the item to the consumer after the purchase. Therefore, conventional methods for physical inspection are unavailable. Consequently, certain characteristics of an item that would indicate whether it is counterfeit cannot be physically verified.
[0018] Historically, some online retailers have required third-party sellers to provide item descriptions. These descriptions typically include structured information that helps determine if the item is counterfeit. This information includes things like item images, batch numbers, manufacturing dates, serial numbers, ISBNs (International Standard Book Numbers), UPCs (Universal Product Codes), dimensions, and weight information, among many other item descriptors. When the descriptors do not match the structured data stored for the item, the online marketplace determines that the item is counterfeit.
[0019] However, this method is not always effective in online environments, including online marketplaces. One problem is that third-party sellers who deliberately distribute counterfeit items can manipulate the information. Many of these sellers distribute large quantities of the same item. In this case, the seller might use descriptions or photos of genuine items when uploading them to the online marketplace. Even third-party sellers making one-off sales of an item might download stock photos and descriptions from other websites in an attempt to conceal that the item is counterfeit. This limits the consumer's opportunity to "virtually" inspect the item. In such cases, the consumer may only discover that the item is counterfeit after receiving it.
[0020] Another problem unique to online marketplaces is the sheer volume of third-party sellers and items offered. New sellers and items are constantly emerging within online marketplaces. Before a large number of items are offered, conventional methods of checking items are often ineffective in identifying counterfeit items. When structured data for comparison is limited or unavailable (typically with many items, especially many new ones), other conventional methods of comparing item descriptors become inefficient. By the time some of these conventional methods become effective, many counterfeit items may have already been distributed downstream.
[0021] Therefore, some online marketplaces aim to detect and remove counterfeit items before they are distributed by third-party sellers. Furthermore, it is beneficial to provide a system that can quickly respond to changes in the online marketplace (e.g., the continuous introduction of new third-party sellers and new items).
[0022] The technology described in this disclosure achieves these objectives and provides a solution to problems specific to online marketplaces. Specifically, this disclosure generally describes a system for detecting counterfeit items by generating questions from various data sources related to the item, including unstructured data. These questions are provided when the item is listed on an online marketplace. As counterfeit items are identified, these questions are continuously ranked, so that questions that are more likely to identify counterfeit items when the item is listed are identified and provided.
[0023] Using this method, questions that help identify counterfeit items are quickly identified and provided when third-party sellers list items. When a new counterfeit item is identified, the ranking of these questions allows the system to begin identifying counterfeits of new items offered on the online marketplace. This helps address the issue of the ever-changing scale and items present in online marketplaces. Furthermore, question generation can utilize unstructured data. Therefore, in addition to identifying questions highly relevant to counterfeit item identification, the system also generates questions that are not easily (and in some cases even impossible) to find online. Thus, third-party sellers who deliberately try to bypass the system by identifying answers indicating genuine items cannot do so in most cases because the answers are not readily available. Moreover, the types of questions generated by the system and provided during the item listing process are highly relevant to identifying counterfeit items within the online environment. Therefore, this technology is particularly well-suited for identifying counterfeit items within online environments, including online marketplaces and other types of online retail platforms, and overall, it is more effective than the conventional methods previously described in identifying counterfeit items.
[0024] A specific example of a method that can be used to achieve these goals and realize these benefits superior to conventional methods using the described techniques begins with identifying item data. Item data is identified and collected from structured data that specifically describes the item using item descriptors or from unstructured data that discusses the item within some general context. The item data is then analyzed based on the type of item data collected. For unstructured data, a natural language processing model can be employed to determine the language and the context in which that language is used. For example, various natural language processing models can be configured, such as BERT (Bidirectional Encoder Representation from Transformer), Generative Pre-trained Transformer (GPT)-2 and (GPT)-3, and / or other natural language processing models.
[0025] From the item data, a natural language processing (NLP) model identifies items and the item features associated with those items. These item features are then used to generate questions based on a set of grammar rules. Furthermore, the NLP model determines the context in which the items and item features are used. When the context is known, questions can be generated from the item features when the context relates to counterfeit items. Sometimes, this increases the likelihood that the question will ultimately be associated with identifying counterfeit items.
[0026] In the example use case, the unstructured data in the form of online forum discussions was obtained using a web crawler. The text data of the forum was processed using a natural language processing (NLP) model. The NLP model identified a specific model of the designer shoes as an item. The NLP model further identified discussions about the designer logo located in the inner tongue area and the double-stitched seams used along the collar, each of which is an item feature. In some cases, the forum discussions might take place in the context of identifying counterfeit items. Questions are then generated by applying grammatical rules to the item features. Here, the question might be, "Does a designer item have a designer logo located inside the tongue?" Another question might be, "What type of stitching is used along the collar of a designer shoe?" Questions can be generated when the NLP model determines the linguistic context and determines that the linguistic context is related to a counterfeit item.
[0027] Once a question is generated, it is stored in association with an item. A group of one or more questions generated for that item is stored as a question set for that item. In this example, each item can have an associated question set specific to that item. As item features are identified for that item, more questions can be added to the question set. Thus, a question set is built for each item over time. Each question in this question set can also have an associated set of counterfeit indicator weights. These counterfeit indicator weights are values that indicate how strongly the question is relevant to identifying a counterfeit item. That is, questions that are relatively strongly relevant to identifying a counterfeit item will be more likely to identify a counterfeit item based on the answer to that question. Each question can have one or more associated counterfeit indicator weights, each counterfeit indicator weight specific to the possible answers to that question. The question set and counterfeit indicator weights can be indexed within the data store for later retrieval.
[0028] To detect counterfeit items, sellers can raise questions with third-party sellers when uploading items to an online marketplace. When a third-party seller attempts to list an item on the online marketplace, they send an item listing request to the marketplace. This item listing request identifies the item to be listed. An item listing request can initiate the item listing process for items provided by the online marketplace.
[0029] As part of the item listing process, the system retrieves a selection of questions from the data store using the provided item identifiers. The selection of questions can be all or part of the set of questions associated with the item. The selection is chosen from a set of questions using a counterfeit indicator weight. One selection method ranks the question set using the counterfeit indicator weight, with the highest-ranking questions being those most relevant to identifying counterfeit items. The selection of questions is determined by choosing some of the highest-ranking questions. The selection may also include newly generated questions or random questions chosen from outside the highest-ranking questions. This continuously identifies other questions that are highly relevant to identifying counterfeit items and are not currently included in the highest-ranking questions. The selection of questions is then provided to client devices, such as those of third-party sellers, as part of the item listing process.
[0030] The system receives answers to a set of questions from a third-party seller's client device. It then determines whether an item is likely counterfeit based on the answers. One approach involves determining a probability value using counterfeit indicator weights determined by the answers to the set of questions. This probability value can be a total weighted value of the answers to the questions as a function of the counterfeit indicator weights. As an example, this probability value can be determined by identifying the counterfeit indicator weights associated with each answer to the questions and calculating the joint probability of these counterfeit indicator weights using a multivariate probability function. A counterfeit indicator threshold can be predefined such that a relatively high threshold requires a relatively high joint probability to determine that the item is counterfeit. The joint probability is compared to the counterfeit indicator threshold, and the item is determined to be counterfeit when the joint probability exceeds the threshold. It should be understood that using a linear combination of weights and probabilities is only one example approach, and other methods are also possible. For example, more complex functions, including neural networks trained for this specific purpose on historical data, can also be used to determine whether an item is counterfeit.
[0031] If the system determines that an item may be counterfeit, it will reject the item listing request. In other words, the system can prohibit the item from being offered to consumers via online marketplaces or other platforms. Alternatively, while a consumer is reviewing items to make a purchase decision, the online marketplace provides the consumer with a value or other indicator of the probability that the item is counterfeit (e.g., by examining the answers and / or images provided by the seller). In this way, the consumer can make a decision about whether to purchase the item based on the probability that the item may be counterfeit, as predicted by the value.
[0032] As described above, the system can continuously change its question set to provide questions most likely to identify counterfeit items, and adapt to new items or change item characteristics. In doing so, the system receives an indication that the item is counterfeit. This indication can be received from consumers, third-party sellers, or any other entity. The online marketplace can also receive items and determine whether they are counterfeit by performing entity checks, thus receiving an indication that the item is counterfeit.
[0033] For example, the counterfeit indicator weight, used to indicate the strength of the relevance between a question / answer and whether an item is counterfeit, can be adjusted after each confirmation that an item is genuine (positive reinforcement) or counterfeit (negative), at specific time intervals, and / or after a specific number of items have been processed. For instance, after receiving an indication that an item is counterfeit, questions and answers provided and received as part of the item transfer through an online marketplace can be retrieved. In the case of a counterfeit item, the counterfeit indicator weight of previous answers is adjusted to show a relatively strong relevance used to indicate that the item is counterfeit. In this way, the questions that previously indicated counterfeit items have been adjusted to show a stronger relevance. New questions provided as part of the selection and any random questions also receive the adjusted counterfeit indicator weight. Similarly, in the case of an item being determined to be genuine, the counterfeit indicator weight can be adjusted to show a weaker relevance to determining whether the item is counterfeit. Once the question set is adjusted, it can be ranked or re-ranked. Subsequent question selections are chosen from the newly ranked or re-ranked question set in response to requests for new item lists. Alternatively, machine learning algorithms can be used to determine whether an item is counterfeit, taking the item and a set of questions as input and outputting the probability that it is counterfeit. This model can be trained using historical data. If a neural network is used, the "weights" of each rule will be the network's parameters, and the training process will adjust these weights to maximize its accuracy on certain test sets.
[0034] Another aspect of this disclosure provides a system for automatically training and using a machine learning model to detect counterfeit items using images. One problem provided in the problem set may include a request for an image of an item or a portion of an item (e.g., specific item features). Item images provided as part of an item listing process are represented as item list images. Using the item list images, a trained machine learning model detects item features of the items and determines whether the item is counterfeit based on probability values determined by the trained machine learning model.
[0035] To train a machine learning model, the system can begin by collecting videos related to the item. These videos can be received from sources indicating their relevance to the item, or obtained by crawling the web to identify items that are relevant to the video. Once the video related to the item has been received, a speech-to-text function (such as Microsoft's Azure Speech-to-Text) can be used to convert the audio information within the video into text data.
[0036] Natural language processing (NLP) models can be used on text data to identify items, item features, or linguistic context. When the NLP model identifies item features and recognizes the linguistic context as relevant to identifying counterfeit items, an image can be obtained from the video. This image can be obtained by taking a snapshot of a video frame. This snapshot is taken at a time in the video that coincides with the text data indicating item features and linguistic context. In this way, there is a possibility that the image contains item features indicating a counterfeit item.
[0037] Images obtained from the video can then be included in the training dataset and stored in the data store. Other images that may be included in the training dataset include images provided as answers to previous questions. The training dataset may also include images of known counterfeit items.
[0038] A training dataset containing images obtained from videos is used to train a machine learning model to provide the trained model. A convolutional neural network can be used as a machine learning model. Once trained, the machine learning model can identify counterfeit items from images.
[0039] In one example, the system provides a set of questions to the third-party seller during the item listing process. One of these questions includes a request for an image of the item. This request may also include a request for specific item characteristics. Upon receiving an image, the system may optionally first determine whether the image has already been retrieved from the internet or another network by performing a reverse image search. Doing so helps ensure that the third-party seller is providing a picture of the actual item being uploaded. If the same image is not found during the reverse image search, that image is fed as input to a trained machine learning model. The trained machine learning model outputs a determination of whether the item is counterfeit based on the image's item characteristics and the likelihood that these characteristics indicate the item is fake.
[0040] Several example scenarios have been provided, and the techniques suitable for performing these examples are described in more detail with reference to the accompanying drawings. It will be understood that additional systems and methods for detecting counterfeit items can be derived from the following description of the techniques.
[0041] Turn now Figure 1 , Figure 1A block diagram of an example operating environment 100 from which embodiments of the present disclosure may be employed is shown. Specifically, Figure 1 An advanced architecture of an operating environment 100 having components according to embodiments of the present disclosure is shown. Figure 1 The components and architecture are intended as examples, as noted at the end of the detailed implementation.
[0042] Among other components or engines not shown, the operating environment 100 includes a client device 102. The client device 102 is shown communicating with a server 106 and a data storage 108 via a network 104. The server 106 is shown as a hosting aspect of the counterfeit item detection system 110.
[0043] Client device 102 can be any type of computing device. One such example is [reference needed]. Figure 9 The computing device 900 is described. However, more broadly, the client device 102 may include a computer-readable medium storing computer-executable instructions that are executed by at least one computer processor.
[0044] Client device 102 can be operated by any person or entity interacting with server 106 to employ aspects of counterfeit item detection system 110. Some example devices suitable for use as client device 102 include personal computers (PCs), laptops, mobile devices, smartphones, tablets, smartwatches, wearable computers, personal digital assistants (PDAs), global positioning systems (GPS) or devices, video players, handheld communication devices, gaming devices or systems, entertainment systems, in-vehicle computer systems, embedded system controllers, remote controls, electrical appliances, consumer electronics, workstations, any combination of these described devices, or any other suitable device.
[0045] Client device 102 may employ computer-executable instructions for an application, which may be hosted partially or wholly on or off client device 102. That is, the instructions may be embodied in one or more applications. Applications typically facilitate information exchange between components of operating environment 100. The application may be a web application running in a web browser. This may be hosted at least partially on the server side of operating environment 100. The application may include specialized applications, such as applications with analytics capabilities. In some cases, the application is integrated into the operating system (e.g., as a service or program). The term "application" is intended to be interpreted broadly.
[0046] As shown in the figure, components or engines of operating environment 100 (including client device 102) can communicate using network 104. Network 104 may include one or more networks (e.g., public networks or virtual private networks "VPNs"), as shown in network 104. Network 104 may include, but is not limited to, one or more local area networks (LANs), wide area networks (WANs), or any other communication network or method.
[0047] Server 106 typically supports counterfeit item detection system 110. Server 106 includes one or more processors and one or more computer-readable media. An example suitable for use is provided by... Figure 9 The computing device 900 is provided. The computer-readable medium includes computer-executable instructions that can be executed by one or more processors. These instructions may optionally implement one or more components of the counterfeit item detection system 110, which will be referred to below. Figure 2 A more detailed description. (and) Figure 1 Like other components, although server 106 is shown as a single server, it may include one or more servers, and the various components of server 106 may be locally integrated within one or more servers or may be distributed in nature.
[0048] Operating environment 100 is shown as having data storage 108. Data storage 108 typically stores information including data, computer instructions (e.g., software program instructions, routines, or services), or models used in embodiments of the described technology. Although described as a single component, data storage 108 may be embodied as one or more data storages or may be located in the cloud. An example of data storage 108 includes... Figure 9 The memory 912.
[0049] Various components of the operating environment 100 have been identified. Note that any number of components can be used within the scope of this disclosure to achieve the desired functionality. Although for clarity, Figure 1 The various components are shown with lines, but in reality, the depiction of the various components is not so clear, and metaphorically, the lines might be more accurately described as gray or blurred. Furthermore, although... Figure 1 Some components are depicted as single components, but these depictions are intended to be illustrative in nature and number and should not be construed as limiting all embodiments of this disclosure. Other arrangements and elements (e.g., machines, interfaces, functions, commands, and function groups) may be used to supplement or replace the arrangements and elements shown, and some elements may be omitted entirely.
[0050] about Figure 2 A sample counterfeit item detection system 200 is provided. The counterfeit item detection system 200 is suitable for use as... Figure 1The counterfeit goods detection system 110. (Regarding...) Figure 2 Many of the elements described are functional entities, which can be implemented as discrete or distributed components or in combination with other components, and implemented in any suitable combination and location. The various functions described herein are performed by one or more entities and can be performed by hardware, firmware, or software. For example, the various functions can be performed by a processor executing computer-executable instructions stored in memory.
[0051] like Figure 2 As shown, the counterfeit item detection system 200 includes a counterfeit item detection engine 202. The counterfeit item detection engine 202 typically generates and provides questions for detecting counterfeit items, and determines whether an item is likely counterfeit based on the answers to the questions. To this end, the counterfeit item detection engine 202 employs an item data collector 204, a natural language processing engine 206, a question generator 208, a machine learning engine 210, a question ranking unit 212, a question selector 214, and a counterfeit item determiner 216.
[0052] As shown in the figure, the counterfeit item detection engine 202 communicates with the data storage 218. The data storage 218 is about... Figure 1 The data storage type described in data storage 108. Data storage 218 is shown as including item data 220, question set 222, training dataset 224, and machine learning model 226. The data shown within data storage 218 is illustrated as an example. More or fewer data elements, or combinations of data elements, may be provided for use by the counterfeit item detection engine 202. [Already provided] Figure 2 The data elements shown are used to describe an example that can be implemented using the described technology.
[0053] Item data collector 204 is typically configured to collect item-related data. Item data collector 204 collects various types of item-related data, including structured and unstructured data. Structured data includes data organized according to a scheme that allows the data to be easily exported and indexed into item data 220 with minimal processing. Structured data can typically be collected and rearranged to conform to the index of item data within item data 220. Unstructured data is any data other than structured data. Unstructured data is related to items and is typically discussed in the context of the item. However, unstructured data usually requires additional processing to store it in a computer-usable format within item data 220.
[0054] Item data collector 204 can apply web crawlers to identify and obtain structured and unstructured data from the Internet or another network. For structured data, item data collector 204 arranges and stores the collected structured data within item data 220. As will be described, unstructured data can be further processed by other components of the counterfeit item detection engine 202. Item data collector 204 can collect item-related data by receiving structured or unstructured data from any other source. Item data can be received from any entity, including third-party sellers, consumers, online marketplaces, manufacturers, retailers, collectors, item experts, websites, and governments, among many other sources. Both structured and unstructured item data can include online conversations, stored chatbot information, manufacturer specifications, item inspection records, expert opinions, item packaging, general communications, books, articles, presentations, or any other medium conveying information. Item data can be in the form of audio, images, video, text, machine language, latent information, etc. The item data collector 204 collects item data by obtaining or receiving it, and stores the collected item data as item data 220 in the data storage 218.
[0055] Natural Language Processing (NLP) engine 206 is typically configured to process item data 220 to identify or extract information. NLP engine 206 can receive collected item data from item data collector 204, process the item data as needed, and store the processed item data as item data 220 in data storage 218. NLP engine 206 can be applied to process both structured and unstructured data.
[0056] To process item data 220, the natural language processing engine 206 is typically applied to the text data within item data 220. For audio and video data, speech-to-text software can be used to convert the audio and video data into text data for further processing by the natural language processing engine 206. An example of speech-to-text software suitable for use with current technology is Microsoft's Azure Speech-to-Text. Other speech-to-text software may also be suitable.
[0057] Natural Language Processing Engine 206 employs a natural language processing model to process item data 220. One example natural language processing model that can be employed by Natural Language Processing Engine 206 is BERT. In some cases, BERT can be pre-trained using any online data source (e.g., data sources provided by Wikipedia and BooksCorpus). A pre-trained BERT model is also available, and the BERT model can be fine-tuned using a corpus of textual information describing the items. In some cases, the textual information within the corpus used for fine-tuning can be tagged to indicate items and item features, and can be tagged to indicate words or phrases associated with a specific linguistic context (e.g., linguistic context related to counterfeit items). It will be understood that other natural language processing models can be used, including one or more models for identifying items, item features, linguistic contexts, and their associations, and such models are intended to be included within the scope of the natural language processing models described herein.
[0058] Once trained, the natural language processing engine 206 can process the item data 220 to identify text elements and context from the text data of the item data 220. The item data 220 is fed as input to the trained natural language processing model of the natural language processing engine 206. The output provided by the trained natural language processing model includes indications of text elements within the item data 220. Text elements may include text data describing items and item characteristics, and may include associations between item characteristics and items. For example, in a document containing a description of branded shoes, the text in the document representing branded shoes is identified and can be associated with or indexed with metadata to indicate that the text represents branded shoes. Similarly, text representing item characteristics (e.g., model number, size, color, manufacturing date and number, logo position, logo size, item label position, text printed on the item label, material composition, weight, etc.) is also identified and associated with metadata or indexed to indicate that the text represents item characteristics. Furthermore, item characteristics can be associated with items. That is, item characteristics can be identified as associated with items based on the context of the text data. Text representing item characteristics can be associated with metadata or indexed to indicate the relationship to the item (e.g., the identified item characteristics are the item characteristics of that item).
[0059] As described above, a trained natural language processing model of the natural language processing engine 206 can be used to identify the linguistic context within the text. The linguistic context of the text identified by the trained natural language processing model can include linguistic context related to counterfeit items. The linguistic context of text data representing items and item features can be related to detecting counterfeit items. The linguistic context of text data can be indicated using metadata. The linguistic context of text data can also be indicated in an index of indexed items and item features.
[0060] Question generator 208 is typically configured to generate questions. Question generator 208 can generate questions based on items and item features identified by natural language processing engine 206. One or more questions can be generated for each identified item. The questions generated for items are shown as a question set 222 stored in data storage 218. Question set 222 can include one or more question sets, each associated with an item.
[0061] Question generator 208 uses a set of language rules to generate questions. This set of language rules includes one or more language rules for each language associated with the text data of item data 220. The language rules can be provided by a trained machine learning model that provides questions about the items using item features. In a broader sense, a neural network can be trained using general text and questions generated from that text. The neural network can be used as the language rules to output questions based on the input of item data 220. Several trained question generation algorithms suitable for use are known in the art. Michael Heilman describes an example approach that can be used with current techniques, as well as a description of historical question generation procedures. M. Heilman. 2011. Automatic Factual Question Generation from Text, PhD dissertation, Carnegie Mellon University, the entire contents of which are incorporated herein by reference. Other methods can be employed within the scope of the techniques described herein.
[0062] Generally, the term "question" is not intended to specifically describe a problem in a grammatical sense. A grammatically correct question is only one aspect included within the term "question." The use of "question" is intended to be broader and includes any request for information. A question can be provided as part of an item listing process initiated in response to an item listing request from a third-party seller. Questions included in question set 222 and generated by question generator 208 can include a wide range of information requests and formats, including requests for descriptive information about items or item characteristics. That is, if item characteristics are further described within item data 220 and the descriptors of the item characteristics are recognized by natural language processing engine 206, a question can be generated to request the descriptors of the item characteristics. Another type of question generated by question generator 208 and stored in question set 222 includes requests for item listing images from third-party sellers, including images of items or item characteristics. Therefore, if items or item characteristics are identified in item data 220, question generator 208 can generate a question to request images of items or item characteristics.
[0063] Machine learning engine 210 is typically configured to train machine learning models used by aspects of fake item detection engine 202. As previously mentioned, fake item detection engine 202 can be trained and employ a natural language processing model (e.g., BERT). Machine learning engine 210 can pre-train or fine-tune the natural language processing model to output a trained machine learning model. Various pre-trained natural language processing models can be used. However, natural language processing models can typically be trained or pre-trained on large text corpora (e.g., text corpora provided by Wikipedia). Machine learning engine 210 can fine-tune the pre-trained model using more specific dataset types. Specific datasets can be included within data storage 218 as part of training dataset 224. This can include various texts that have been labeled to indicate text representing items and item features. Labeled associations can be included to indicate associations between items and item features within the text. Additional labels can be added to indicate words describing aspects of item features (e.g., location, size, etc.). For example, text representing branded shoes can be labeled as an item, while an item logo can be labeled as an item feature and marked to show the association between the item feature and the item. Descriptive aspects of the item feature can include location (e.g., the inner tongue of the left shoe) and the size of the logo located at that location, and can be labeled to indicate further description of the item feature. Furthermore, known documents describing the detection of counterfeit items can be used to train a natural language processing model to recognize context relevant to the detection of counterfeit items. Some of these documents may include expert reports. This labeled data can be included in training dataset 224 for training a machine learning model employed by the counterfeit item detection engine 202. The trained machine learning model is stored as machine learning model 226 in data storage 218 for use by aspects of the counterfeit item detection engine 202.
[0064] The machine learning engine 210 can also be used to train a machine learning model for detecting counterfeit items from images. A convolutional neural network is one example of a machine learning model that can be used to detect counterfeit items within an image. The machine learning engine 210 can use a training dataset 224 to train the machine learning model. Here, the training dataset 224 includes training images of known counterfeit items or items that may be counterfeit. The training images of the items can include the item's features. The training images can be obtained from videos associated with the items, identified from online images including descriptions of items as counterfeit, provided from images taken during the inspection of known counterfeit items, received from consumers, received as item listing images from third-party sellers, retrieved from government databases that classify known counterfeit items, etc.
[0065] On one hand, training images are determined from images or videos identified by the item data collector 204. Images obtained by the item data collector 204 can be processed to determine whether an image includes text or metadata indicating whether the image contains a counterfeit item. This can be done using the natural language processing engine 206. When an image is determined to be associated with a context for identifying a counterfeit item, the image can be provided as a training image to the training dataset 224. Training images can include images obtained from videos. The video identified by the item data collector 204 can be processed using the natural language processing engine 206 (including speech-to-text functionality and a trained natural language processing model). Text data determined from the video is associated with a specific time within the video. By analyzing the text data to identify items, item features, or context related to identifying counterfeit items, the time associated with the text data representing the item, item features, or context can be identified. Video images in the video at that corresponding time can be obtained by taking snapshots of video frames. These images are labeled with items or item features and are marked as related to counterfeit item detection. This labeled image is then stored as part of the training dataset 224. In some cases, the labeled image may be provided to personnel for verification of the image and the label before it is included in the training dataset 224.
[0066] The issue ranking unit 212 is typically configured to rank issues. It ranks a set of issues to provide a ranked set. The issue ranking unit 212 can rank one or more sets of issues within the issue set 222. As part of ranking the issue sets, the issue ranking unit 212 can both rank and re-rank the issue sets. The issue ranking unit 212 can rank issues in response to an indication that an item is counterfeit. This can be done after modifying the counterfeit indication weight. As will be discussed, the issue ranking unit 212 can rank issues in response to rejecting counterfeit items.
[0067] One method for ranking questions involves ranking them based on counterfeit indicator weights. In the context of machine learning, these weights can be referred to as probabilities, and a weight representing the probability that an item is a counterfeit is associated with each question-answer pair. Each question can have one or more counterfeit indicator weights associated with it. Some questions will have multiple answers. Therefore, a question can have multiple counterfeit indicator weights associated with it, where each counterfeit indicator weight is associated with one of the answers. Typically, the counterfeit indicator weight represents the strength of the correlation between the answer to the question and whether the item is a counterfeit. The counterfeit indicator weights can be indexed in association with the questions stored within the question set 222.
[0068] As will be described further, the question ranking unit 212 adjusts the counterfeit indicator weight based on feedback regarding whether an item is genuine or counterfeit. While various algorithms can be derived to provide values to and modify the counterfeit indicator weight, one example approach defines the counterfeit indicator weight based on a range from -1.00 to 1.00. Here, a negative value indicates an indirect correlation between the answer to the question and whether the item is counterfeit. Therefore, an answer with a correlation of -1.00 would indicate that the item is not counterfeit. As the value increases from -1.00 to 0, the counterfeit indicator weight still indicates an indirect correlation, and the item is unlikely to be counterfeit; however, higher values (closer to 0) represent relatively weak correlations. For example, a value of -0.75 is a relatively strong inverse indicator compared to a larger value of -0.25. At this scale, 0 would indicate no correlation between the answer and whether the item is counterfeit. Conversely, a value of 1.00 indicates that the item is counterfeit. Therefore, a positive value at this scale would indicate a direct correlation between whether the item is counterfeit. As the values decrease from 1.00 to 0, these values still indicate a direct correlation, and the item is likely counterfeit. However, the strength of the correlation decreases as the value decreases. For example, a value of 0.75 is a relatively strong direct indicator that the item is counterfeit, compared to a value of 0.25. Again, it should be understood that this is merely one way to define counterfeit indicator weights using an example scale. Other scales can be defined and used. This is intended to be one example of the methods suitable for use. However, it is also intended to include other methods within the scope of this disclosure as counterfeit indicator weights. For example, some configurations may employ neural networks to identify counterfeit items, and the update rules used in the neural network (backpropagation algorithm) will include updating (reducing) weights when the model makes an incorrect prediction.
[0069] The issue ranking tool 212 adjusts the counterfeit indication weight based on feedback, including whether an item is genuine or counterfeit. This feedback can be received from any source, including consumers, online marketplaces, retailers, experts, government officials, manufacturers, and third-party sellers. When feedback is received about an item, previous answers to questions related to that item can be identified, and the counterfeit indication weight associated with those answers can be adjusted based on that indication.
[0070] In the described example method, when an item is determined to be counterfeit, question ranking unit 212 increases the counterfeit indication weight associated with the answer. If the feedback indicates that the item is genuine, question ranking unit 212 decreases the counterfeit indication weight associated with the answer. The amount of increase or decrease can be based on the total feedback received for that item, including one or more pieces of feedback indicating whether the item is counterfeit or genuine.
[0071] One mechanism suitable for use with the described example method for determining the increase or decrease of a forgery indication weight involves assigning -1.00 to the answer when the item is identified as genuine, and assigning 1.00 to the answer when the item is identified as a forgery. The average of each assigned value for the answers to all feedback received for that item provides the forgery indication weight.
[0072] As an example, during the item listing process, a third-party seller provides an answer to a question. If the item is determined to be counterfeit, the answer is assigned a value of 1.00. If another seller provides the same answer to the same question during the same item listing process, then if the item is determined to be counterfeit, the answer is assigned a second value of 1.00. If a third-party seller provides the same answer to the same question, but the item is determined to be genuine, the answer is assigned a third value of -1.00. Averaging these values yields 0.33, which is the counterfeit indicator weight associated with the answer to the question according to this example method.
[0073] The question ranking algorithm 212 can rank a set of questions based on counterfeit indicator weights. In the described example method, counterfeit indicator weights with larger values rank higher because they are more relevant to determining whether an item is counterfeit. Therefore, when a counterfeit indicator weight is large, questions associated with answers that have that weight rank higher. The absolute value of the counterfeit indicator weight can be determined before ranking. This is because values close to -1.00 also strongly indicate whether an item is counterfeit, but in the opposite way. In this way, questions with answers that are closely related to indicating whether an item is counterfeit rank higher.
[0074] Question selector 214 is typically configured to select questions from a set of questions for an item. Question selector 214 may also select questions from a ranked set of questions ranked by question ranking 212. The questions selected by question selector 214 are provided as a selection of questions.
[0075] Typically, any number of questions can be selected by the question selector 214 and provided to the third-party seller as part of the item listing process and in response to an item listing request. The number provided can be a pre-configured number. While any number can be selected again, an example of a pre-configured number is 10 questions selected as part of a selection set of questions chosen from the question set for the item.
[0076] Question selector 214 can be configured to select only the top-ranked questions from the question set. The question selector can also be configured to select new or lower-ranked questions to include in the question set. In this way, new questions can be introduced, allowing their counterfeit indicator weights to be established and adjusted by question ranking 212. Other questions with lower counterfeit indicator weights than the top-ranked questions can be randomly selected and included in the question set. This allows for continuous adjustment of the counterfeit indicator weights for all questions within the question set for an item. This also helps eliminate any bias towards top-ranked questions. On the other hand, questions within the question set that are not strongly correlated with determining whether an item is counterfeit (e.g., questions determined by a low relevance threshold) can be removed from the question set by question ranking 212. This makes the processing of the question set not require continuously increasing computer processing power as new questions are added to the system.
[0077] Counterfeit item determiner 216 is typically configured to determine whether an item is counterfeit. One method involves counterfeit item determiner 216 receiving an item listing request 228. Item listing request 228 can be received from a third-party seller attempting to offer the item using an online marketplace, and can also be provided from a client device. As part of item listing process 232, counterfeit item determiner 216 provides a question 230 selected by question selector 214. Counterfeit item determiner 216 then receives an answer 234 to question 230 from the third-party seller. Answer 234 can be provided in any form, including item listing images, videos, text data, confirmation of information (e.g., radio buttons, checkboxes, etc.). Question 230 can also be provided in any form, including images, videos, text data, including open-ended and closed-ended information requests, etc.
[0078] In some cases, using a chatbot to ask questions can be beneficial. This feature allows asking and answering a question before moving on to another. In such situations, subsequent questions can be asked based on the answers to previous questions. Questions can be asked sequentially and continuously until a threshold confidence level (or value) is reached, as will be discussed later, thus allowing a determination to be made as to whether the item is counterfeit.
[0079] Upon receiving answer 234, the counterfeit item determiner 216 determines whether the item is counterfeit, for example, whether the item is likely to be a counterfeit item at a certain confidence level. One way to make this determination is based on the determination of a probability value. The probability value is determined using a counterfeit indication weight associated with answer 234. It will be understood that multiple answers can exist within answer 234, and therefore multiple counterfeit indication weights can exist for determining the probability value. Other methods can be employed to determine whether an item is likely to be counterfeit based on multiple counterfeit indication weights associated with answer 234. This is merely one example method suitable for use with the present invention. Other methods are intended to be included within the scope of this disclosure because they involve determining whether an item is counterfeit based on answer 234.
[0080] One example method for determining the probability value is to determine the total weighted value of answer 234. This can be done by averaging the forgery indicator weights of answer 234. Using this method, the average is the probability value. Another method employs a higher-dimensional analysis function. Here, the forgery indicator weights can be applied to a multivariate probability function to determine the joint probability of the forgery indicator weights. In this method, the joint probability provides a probability value for the forgery item determiner 216 to use to determine whether the item is a forgery. Another approach is to view the weights as the probability that an item is a forgery given the item and the question and answer. The weights can be between 0 and 1, and a neutral weight is 0.5. Odd ratios can also be used. Furthermore, machine learning models (e.g., neural networks) can be used to predict the overall forgery probability, making the aggregation function potentially non-linear.
[0081] To determine whether an item is likely counterfeit, the counterfeit item determiner 216 can compare the determined probability value with a counterfeit indication threshold. Using a counterfeit indication threshold is one technical method for implementing the underlying technology. However, the actual value of the counterfeit indication threshold can be any value, and it can be predetermined based on a decision balancing the percentage of counterfeit items correctly identified as counterfeit and any false positive error rate that may occur due to misidentifying genuine items as counterfeit. For example, using the method described in this disclosure, an example counterfeit indication threshold can be set to 0.95. In this way, the counterfeit item determiner 216 will determine that any item with a probability value between 0.95 and 1.00 is a counterfeit item.
[0082] This specific value can be determined by identifying a known counterfeit item and answering a question provided by the counterfeit item determiner 216 for that item. This can be done, for example, using machine learning with precise recall curve analysis. The counterfeit item determiner 216 provides a probability value that the item is a counterfeit. This process can be performed using a set of known items (including both counterfeit and genuine items). A counterfeit indication threshold can be set to exclude a specific percentage of counterfeit items compared to the false positive percentage, such as those items whose probability value exceeds the set counterfeit indication threshold but are genuine.
[0083] If the counterfeit item determiner 216 identifies the item as counterfeit, it can reject the item listing request. This rejects a third-party seller's request to place the item on the online marketplace. The method also allows for the detection and rejection of counterfeit items before they are offered to consumers or further into the downstream market.
[0084] In response to determining that the item is a counterfeit, the counterfeit item determiner 216 can provide an indication to the issue ranking unit 212 that a counterfeit item has been detected. As described above, the issue ranking unit 212 can rank or re-rank the set of issues associated with the item based on the indication that the item is a counterfeit.
[0085] It will be recognized that the counterfeit item detection engine 202 uses the counterfeit item determiner 216 during multiple item listing processes and for various items listed on the online marketplace. Therefore, feedback obtained in the first item listing process for a first item listing request can be used in the second item listing process for a second item listing request, and both can be used in the third item listing process for a third item listing request, and so on. In this way, previous answers to a previous set of questions can be used to determine the ranking of the question set, and this ranked question set can be used for the current question set.
[0086] In some configurations, issue selection can be done implicitly through weights. For example, issues with weights close to 0 have little effect on the final counterfeit decision. Other configurations can rank the issues for selection. Now let's turn to... Figure 3 Provides usage Figure 2 The counterfeit goods detection system 200 provides an example ranking and selection chart for the problem. Now refer to... Figure 2 and Figure 3 Both.
[0087] Specifically, by Figure 3The provided example depicts index 300A, which includes a first column with question set 302A and a second column with counterfeit item indication weight 304A. Question set 302A can be associated with items. Question set 302A is shown as having multiple questions (including Question1 to Question...). N This indicates that the question set 302A can include any number of questions. Each question in the question set 302A has an associated counterfeit indication weight within the counterfeit indication weight 304A, which is shown as X1 to X... N This indicates that any number of counterfeit indication weights can be included as associated with issue set 302A. Issues within issue set 302A can be ranked based on their associated counterfeit indication weights within counterfeit indication weights 304A.
[0088] Furthermore, each question can have one or more counterfeit item indication weights. Therefore, since each question in question set 302A can have more than one answer, and each answer has an associated counterfeit item indication weight, X1 throughout index 300A is intended to represent one or more counterfeit item indication weights, etc., associated with Question 1. Index 300A can be stored in data storage 218 for use by aspects of the counterfeit item detection engine 202. In one aspect, ranking can be based on the strongest counterfeit item indication weight of the answer to a question related to determining whether an item is counterfeit. For example, if a question has two answers, the answer with the strongest relevance counterfeit item indication weight can be used to rank the questions in the question set (e.g., question set 302A). This ranking can also be based on the maximum absolute value of the counterfeit item indication weight. In another aspect, the counterfeit item indication weight is ranked based on the strongest direct relevance used to indicate a counterfeit item.
[0089] For example, question selector 214 can select one or more questions from question set 302. As shown in the figure, question selector 214 has already selected some of the top-ranked questions (Question1 to Question2). 10 The first choice is 306A. The counterfeit item determiner 216 can provide the first choice 306A during the item listing process. After feedback regarding whether the item is indicated as a counterfeit item, the question ranker 212 modifies the counterfeit item indication weight 304A to provide a modified counterfeit item indication weight 304B and a ranked question set 302A to provide the ranking shown in the ranked question set 302B at index 300B. The ranking performed by the question ranker 212 is indicated by arrow 308. Index 300B is the same index as index 300A. However, index 300B shows the ranked question set 302B associated with the modified counterfeit item indication weight 304B after the question ranker 212 is applied in response to feedback.
[0090] As shown in the figure, the process can continue using question selector 214 to select a second option 306B from the ranked question set 302B based on the counterfeit item indication weight 304B. The counterfeit item determiner 216 can be used to provide the second option 306B to a third-party seller during the second item listing process in response to a second item list request. As shown in the figure, the second option 306B includes Questions 1 to 7, Question... 13 Question 17 and Question 23 As shown in the figure, and based on the ranking, the second option 306B includes some questions not included in the first option 306A.
[0091] The question set can be provided in any way. In one approach, a chatbot is used to ask questions sequentially based on ranking within a predetermined number of subsequent questions, until a threshold confidence level is determined, until a predetermined number of questions have been asked, or until a probability value that will not statistically exceed a counterfeit indication threshold is determined.
[0092] It will be understood that the indexes shown in indices 300A and 300B are an example of how issue and counterfeit indicator weights can be indexed and stored in data storage 218. Other methods can be used to index information in a manner that allows it to be recalled by the counterfeit item detection engine 202, and these other methods are intended to be included within the scope of this invention.
[0093] Now for reference Figure 4 Example Figure 400 is provided, which illustrates the process performed by the counterfeit item detection system 200 to identify training data for a machine learning model to detect counterfeit items using images.
[0094] refer to Figure 2 and Figure 4 The video 402 is received from the item. The video 402 can be received from any entity, including consumers, third-party sellers, retailers, manufacturers, government agencies, etc. The video 402 can be received from the internet or another network. In one aspect, the video 402 uses a web crawler for identification and collection. The video 402 can be collected using an item data collector 204.
[0095] The fake item detection engine 202 can use the natural language processing engine 206 to determine whether the collected video is related to the item. The natural language processing engine 206 can analyze text associated with video 402, such as text on webpage 404 (from which video 402 is retrieved), or other text associated with video 402. Similarly, the natural language processing engine 206 can analyze metadata accompanying video 402 to determine whether video 402 is related to the item. Furthermore, the natural language processing engine 206 can determine whether video 402 is related to the item by performing speech-to-text conversion and then identifying text elements representing the item from text data 406.
[0096] Once a connection to the item is established, the natural language processing engine 206 uses speech-to-text software to convert the audio within video 402 into text data 406, as shown by arrow 408. Natural language processing can then be applied to the text data 406 as previously described to identify text elements representing the item, item features, and / or linguistic context, as shown by arrow 410.
[0097] When the identified linguistic context relates to detecting counterfeit items, an image 414 is obtained from video 402 at the corresponding time. That is, the audio of video 402 has a time corresponding to the visual aspect of video 402. This audio is converted into text data 406 by speech-to-text software; therefore, the text elements of text data 406 have a time corresponding to both the audio and the visual aspect of video 402. Figure 4 The time is shown as 412. The context related to detecting counterfeit items is determined from the text elements; therefore, the time associated with the context, item, and item features within the text data 406, as well as the corresponding time in video 402, can be identified. Figure 4 As shown, image 414 was obtained from video 402 at time 412, as indicated by arrow 416.
[0098] Image 414 may be labeled (e.g., tagged or otherwise associated) with a language context label 418 (indicating the identified language context), an item label 420 (indicating the identified item), or an item feature label 422 (indicating the identified item feature). Image 414 and any labels are provided as input 424 for training a machine learning model for machine learning engine 210. Input 424 may be stored in data storage 218 within training dataset 224 for later use by machine learning engine 210 to train the machine learning model. A suitable machine learning model for training to detect counterfeit items is a convolutional neural network. Machine learning engine 210 outputs a trained machine learning model that can be applied to subsequently received images (e.g., an image of a list of items provided in response to a question) to detect counterfeit items from the images.
[0099] about Figures 5 to 8 A block diagram illustrating methods for detecting counterfeit articles is provided. These methods can be executed using a counterfeit article detection engine 202. In embodiments, computer-executable instructions are stored on one or more computer storage media that, when executed by one or more processors, cause one or more processors to perform these methods. The method may be part of a computer-implemented method implemented by a system including a computer storage medium and at least one processor. It will be appreciated that... Figures 5 to 8 The methods described are example methods, and other methods can and will be derived from the described techniques.
[0100] Figure 5 A block diagram of an example method 500 for detecting counterfeit items is shown. At block 502, a first set of questions from a set of questions is provided. The first set of questions can be provided in response to a first item list request. The first set of questions can be presented during an item list process initiated in response to a first item list request. Figure 2 The counterfeit item determiner 216 can be used to provide a first set of questions as part of the item listing process. The first set of questions can be provided to the third-party seller at the client device. At box 504, answers to the first set of questions are received. These answers can be received from the client device provided by the third-party seller.
[0101] The question set includes the generated questions. To generate questions, a natural language processing (NLP) model can be used to identify item features from text data and the linguistic context associated with the identified item features. The NLP engine 206 can employ a NLP model. When the linguistic context is relevant to counterfeit item detection and is included in the question set, a question is generated using the identified item features. This question can be generated by employing linguistic rules using the question generator 208. Another question can be generated by determining text data from a video containing items. This text data can be determined using the NLP engine 206. The NLP model of the NLP engine 206 is then used to identify item features and the linguistic context associated with counterfeit item detection. This question is generated to request a list of item images including the identified item features. These questions are generated to be included in the question set.
[0102] At box 506, an indication is received that the item is counterfeit. As mentioned earlier, this indication can be received from any entity, including third-party sellers, consumers, etc. At box 508, the question set is ranked. This can be done using... Figure 2 The question ranking algorithm 212 performs this function. The question set can be ranked based on the relevance between answers to a first set of questions and whether the item is a counterfeit. In some cases, this ranking is based on a counterfeit indicator weight, which indicates the strength of the relevance between answers to the first set of questions and whether the item is a counterfeit. The method may include modifying the counterfeit indicator weight associated with the first set of questions based on an indication that the item is a counterfeit. Ranking the question set provides a ranked question set. It will be understood that the question set may have a prior ranking, and ranking this question set also provides a ranked question set in the form of a re-ranked question set.
[0103] At box 510, a second selection of questions is provided from a ranked set of questions. This second selection of questions can be provided in response to a second item list request and as part of the second item list process. The second selection of questions can be provided by a fake item determiner 216. A question selector 214 can be used to select the second selection of questions from the ranked set. Answers to the second selection of questions can be received, and an item list image can be included in response to a question in the second selection of questions requesting an item list image. Method 500 can also include rejecting a second item list request based on the answers provided to the second selection of questions. A trained machine learning model can use the item list image to determine if an item associated with the second item list request is a fake item, and the rejection of the second item list request can be performed based on this determination.
[0104] Figure 6A block diagram of an example method 600 for detecting counterfeit items is provided. At block 602, the answer to a first set of questions is received. The first set of questions can be provided to a client device of a third-party seller in response to a request for a first list of items. This can be used... Figure 2 Question selector 214 selects a first question set from the ranked question set. The ranking of this question set can be performed using question ranking unit 212, and is based on identifying counterfeit items and the previous question set that associates previous answers with counterfeit items.
[0105] At box 604, an item is determined to be a counterfeit item based on the answer to the first set of questions. This determination can be made using a counterfeit item determiner 216. The item is determined to be a counterfeit item by determining a probability value based on the answer to the first set of questions and the counterfeit item indication weight associated with the first set of questions. At box 606, the first item list request is rejected based on the item being a counterfeit item.
[0106] Method 600 may further include re-ranking the question set based on determining that an item is a counterfeit. This re-ranking may be performed by question ranking unit 212. The re-ranking may be based on a modified counterfeit indicator weight, where the counterfeit indicator weight indicates the strength of the correlation between the answer to the first question set and the item being a counterfeit. A second question set selected from the re-ranked question set may be provided in response to a second item list request. Method 600 may include generating questions to include in the ranked question set. These questions may be generated using a method similar to 500, and may also be generated using question generator 208.
[0107] Figure 7 A block diagram illustrating another example method 700 for detecting counterfeit items is provided. At block 702, an indication that the item is counterfeit is received. As previously stated, this indication can be received from any entity. At block 704, answers to a first set of questions are identified. The first set of questions is selected from a set of questions associated with the item. Answers to the first set of questions may include an image of a list of items.
[0108] In some cases, questions in the question set are generated using a natural language processing model (e.g., the model employed by natural language processing engine 206) to identify item features of items from text data with linguistic context relevant to counterfeit item detection. Language rules (e.g., language rules employed by question generator 208) can be used to generate questions based on item features in response to the linguistic context being relevant to counterfeit item detection.
[0109] At box 706, the set of questions associated with the item is ranked to provide a ranked set of questions. This ranking can be based on the relevance of answers to the first set of questions to the counterfeit item. For example, the set of questions can be ranked using modified counterfeit item indicator weights. The counterfeit item indicator weights associated with the first set of questions can be modified using question ranking unit 212 based on an indication that the item is a counterfeit, where the counterfeit item indicator weights indicate the strength of the relevance between the answers to the first set of questions and the fact that the item is a counterfeit.
[0110] At box 708, a second set of questions is provided from the ranked set of questions associated with the item. The second set of questions can be selected from the ranked set of questions using question selector 214. The second set of questions can be provided during the item listing process in response to an item listing request. In some cases, the second set of questions includes questions from the ranked set that were not included in the first set of questions.
[0111] Method 700 may further include labeling a first item list image as a counterfeit and providing the labeled first item image (assuming the image is an image of an actual item, not a stock photo of a genuine item) to a machine learning model. This can be performed using machine learning engine 210. The labeled first item image may be included in a training dataset for use by machine learning engine 210 to train a model to identify counterfeit items. If the answer to a second question selection includes a second item list image, the machine learning model trained by machine learning engine 210, based at least in part on the output of the labeled first image, is used to determine whether the second item list image includes a counterfeit item. If the item is determined to be a counterfeit, the second item list request associated with providing the second item list for the second question selection can be rejected.
[0112] Figure 8 A block diagram illustrating another example method 800 for detecting counterfeit items is provided. At block 802, items and item features are identified from within the video. These items and features can be identified from text data of the video converted by speech-to-text software, and are determined using a natural language processing model provided by natural language processing engine 206.
[0113] At box 804, an image of the item and its features is obtained. This image is obtained from the video. This image can be obtained at a time corresponding to the use of the item and its features within the text data and video. It can also be obtained in response to the linguistic context of the text data being identified as relevant to counterfeit item detection. The image can be labeled with identified items, item features, or linguistic context. At box 806, a machine learning model is trained using the labeled image of the item and its features. This labeled image serves as part of the training dataset used to train the machine learning model. The machine learning engine 210 can be used to train the machine learning model using the labeled image to output a trained machine learning model for identifying counterfeit items.
[0114] At box 808, an item list image is received. The item list image can be received as an answer to a question (as part of the item list process) provided by the counterfeit item determiner 216 in response to an item list request. At box 810, items within the item list image are identified as counterfeit items by a trained machine learning model. The item list request can be rejected in response to the identification of an item as counterfeit. In some cases, the item list image is then provided to a training dataset to further train the machine learning model. The item list image can be provided to the training dataset after receiving confirmation from another source that the item is counterfeit.
[0115] Having described an overview of embodiments of the present technology, the following describes an example operating environment in which embodiments of the present technology may be implemented, in order to provide a general context for the various aspects. Specifically, reference is made first to... Figure 9 An example operating environment for implementing embodiments of the present technology is shown and is generally designated as computing device 900. Computing device 900 is merely an example of a suitable computing environment and is not intended to imply any limitation on the scope or functionality of the technology. Nor should computing device 900 be construed as having any dependencies or requirements associated with any one or combination of the components shown.
[0116] The techniques disclosed herein can be described in the general context of computer code or machine-usable instructions, including computer-executable instructions (e.g., program modules) that are executed by a computer or other machine (e.g., a personal data assistant or other handheld device). Typically, a program module, including routines, programs, objects, components, data structures, etc., refers to code that performs a specific task or implements a specific abstract data type. This technique can be practiced in a variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, and more specialized computing devices. The technique can also be practiced in distributed computing environments, where tasks are performed by remote processing devices linked via a communication network.
[0117] Continue to refer to Figure 9The computing device 900 includes a bus 910 that is directly or indirectly coupled to the following devices: memory 912, one or more processors 914, one or more presentation components 916, input / output ports 918, input / output components 920, and illustrative power supply 922. The bus 910 represents one or more buses (e.g., an address bus, a data bus, or a combination thereof).
[0118] Although for the sake of clarity, Figure 9 Each box is represented by a line, but in reality, depicting the various components is not so clear, and metaphorically, the lines would be more accurately described as gray and blurred. For example, presentation components such as display devices can be considered as I / O components. Furthermore, processors have memory. This is the nature of the art, and to reiterate... Figure 9 The figures only illustrate example computing devices that can be used in conjunction with one or more embodiments of this technology. There is no distinction between categories such as "workstation," "server," "laptop," and "handheld device," as all these categories are... Figure 9 Within the scope and with reference to "Computing Devices".
[0119] Computing device 900 typically includes a variety of computer-readable media. Computer-readable media can be any available medium that can be accessed by computing device 900, and includes volatile and non-volatile media, removable and non-removable media. By way of example and not limitation, computer-readable media can include computer storage media and communication media.
[0120] Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information, such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to: RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, Digital Universal Optical Disc (DVD) or other optical disc storage devices, magnetic tape cassettes, magnetic tape, disk storage devices or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computing device 900. Computer storage media themselves do not include signals.
[0121] Communication media typically embody computer-readable instructions, data structures, program modules, or other data in the form of modulated data signals (such as carrier waves or other transmission mechanisms), and include any information transmission medium. The term "modulated data signal" refers to a signal configured or altered in a manner that encodes information in the signal, which has one or more characteristics. By way of example and not limitation, communication media include wired media such as wired networks or direct wired connections, and wireless media such as acoustic, RF, infrared, and other wireless media. Any combination of the above should also be included within the scope of computer-readable media.
[0122] Memory 912 includes computer storage media in the form of volatile or non-volatile memory. Memory can be removable, non-removable, or a combination thereof. Example hardware devices include solid-state memory, hard disk drives, optical disk drives, etc. Computing device 900 includes one or more processors that read data from various entities such as memory 912 or I / O components 920. Presentation component 916 presents data indications to a user or other device. Examples of presentation components include display devices, speakers, printing components, vibration components, etc.
[0123] I / O port 918 allows computing device 900 to be logically coupled to other devices, some of which may be built-in, including I / O components 920. Illustrative components include microphones, joysticks, game controllers, satellite antennas, scanners, printers, wireless devices, etc.
[0124] The above embodiments can be combined with one or more of the specifically described alternatives. Specifically, the claimed embodiments may include references to more than one other embodiment in the alternatives. The claimed embodiments may specify additional limitations on the claimed subject matter.
[0125] This document specifically describes the subject matter of the technology to meet legal requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors have envisioned that the claimed or disclosed subject matter may also be embodied in other ways to incorporate different steps or combinations of steps similar to those described in this document, in conjunction with other prior art or future technology. Furthermore, although the terms “step” or “box” may be used herein to refer to different elements of the method employed, such terms should not be construed as implying any particular order between or between the various steps disclosed herein, unless and only if the order of the various steps is explicitly described.
[0126] For the purposes of this disclosure, the word “comprising” has the same broad meaning as the word “including”, and the word “access” includes “receiving,” “referencing,” or “retrieval.” Furthermore, the word “communication” has the same broad meaning as the words “receiving” or “sending” facilitated by a software- or hardware-based bus, receiver, or transmitter using the communication medium described herein. Additionally, the word “initiate” has the same broad meaning as the words “execute” or “instruct”, where the corresponding action can be executed to completion or interruption based on the occurrence of another action. Furthermore, unless otherwise stated, words such as “a” or “an” include both plural and singular. Thus, for example, the constraint “one feature” is satisfied when one or more features are present. Furthermore, the term “or” includes conjunctions, disjuncts, and both (a or b therefore includes a or b, and a and b).
[0127] For the purposes of the detailed discussion above, embodiments of this technology are described with reference to a distributed computing environment; however, the distributed computing environment described herein is merely an example. Components may be configured to perform novel aspects of the embodiments, wherein the terms "configured for" or "configured to" may mean "programmed to" perform a specific task or implement a specific abstract data type using code. Furthermore, while embodiments of this technology can generally be referenced to the counterfeit item detection system and schematic diagrams described herein, it should be understood that the described technology can be extended to other implementation contexts.
[0128] As can be seen from the foregoing, this technology is well-suited to achieving all the aforementioned goals and objectives, including other advantages that are apparent or inherent to the structure. It will be understood that certain features and sub-combinations are useful and can be employed without reference to other features and sub-combinations. This is contemplated by the claims and is within the scope of the claims. Since many possible embodiments of the described technology can be made without departing from this scope, it should be understood that everything described herein or shown in the accompanying drawings should be interpreted as illustrative rather than restrictive.
Claims
1. A computer-based method for detecting counterfeit items: Identify items and item characteristics from item data associated with items; A question set is generated based on the items and their characteristics; In response to a first item list request from a client device, a first set of questions is provided to the client device from the set of questions; Receive the answers to the first set of questions from the client device; Based on the answers to the first set of questions, it is determined that the item is a counterfeit. The question set is ranked based on the strength of the correlation between the answers to the first question set and whether the item is a counterfeit item; as well as In response to a request from a client device for a second item list of the items, a second set of questions is provided to the client device from a ranked set of questions.
2. The computer-implemented method of claim 1 further includes rejecting the second item list request based on the answer to the second question selection set.
3. The computer- implemented method of claim 2, wherein, The answer to the second question selection includes an item list image of the item features of the item, and wherein the second item list request is rejected based on the item list image of the item features.
4. The computer-implemented method according to claim 1 further includes: The counterfeit item indication weight associated with the first question selection is modified based on the indication that the item is a counterfeit item. The counterfeit item indication weight indicates the strength of the correlation between the answer to the first question selection and the item being a counterfeit item, wherein the ranking is based on the counterfeit item indication weight.
5. The computer-implemented method according to claim 1, further comprising: Using natural language processing models to: Identify item features from text data; as well as Identify the linguistic context associated with the features of the identified items; as well as When the language context is relevant to counterfeit item detection, questions associated with the features of the identified item are generated and included in the question set.
6. The computer-implemented method according to claim 1 further includes: Determine text data from the video including the items; Natural language processing models are used to identify item features and linguistic context relevant to counterfeit item detection from the text data; as well as The question that generates a list of items including the features of the items is generated to be included in the question set.
7. The computer- implemented method of claim 1, wherein, The first set of questions is presented during the item list process initiated in response to the first item list request.
8. A counterfeit goods detection system, the system comprising: At least one processor; as well as A computer storage medium storing computer-executable instructions, which, when executed by the at least one processor, cause the at least one processor to perform operations, the operations including: Identify items and item characteristics from item data associated with items; A question set is generated based on the items and their characteristics; Receive answers to a first set of questions from a client device, the first set of questions being provided in response to a request for a first list of items, wherein the first set of questions is selected from the set of questions; The item is determined to be counterfeit based on the answers from the first set of questions received from the client device; The question set is ranked based on the strength of the correlation between the answers to the first set of questions and whether the item is a counterfeit; and In response to a request from a client device for a second item list of the items, a second set of questions is provided to the client device from a ranked set of questions.
9. The system according to claim 8, wherein, Ranking the set of questions also includes: Modify the counterfeit item indication weight associated with the first question selection set, wherein the counterfeit item indication weight indicates the strength of the correlation between the answer to the first question selection set and the item being a counterfeit item; and The problem set is ranked using the modified counterfeit indicator weights.
10. The system according to claim 8, wherein, The answers to the first question selection set include an image of a list of items containing the item features of the items, and the first item list request is rejected based on the item features of the items within the item list image.
11. The system according to claim 8, further comprising: The questions included in the question set are generated in the following manner: Identify item features associated with the item from text data, the item features being identified using a natural language processing model; The natural language processing model is used to identify the linguistic context associated with the identified item features; and When the language context is relevant to counterfeit item detection, a question associated with the item's characteristics is generated.
12. The system according to claim 8, further comprising: During the item list process initiated in response to the first item list request, the first question selection is presented.
13. The system according to claim 8, wherein, Determining that the item is a counterfeit item also includes: The probability value is determined based on the answers to the first question set and the counterfeit indication weights associated with the first question set; and The probability value is compared with the counterfeit indication threshold.
14. One or more computer storage media storing computer-executable instructions, which, when executed by a processor, cause the processor to perform a method for detecting counterfeit articles, the method comprising: Identify items and item characteristics from item data associated with items; A question set is generated based on the items and their characteristics; In response to a first item list request from a client device, a first set of questions is provided to the client device from the set of questions; Receive the answers to the first set of questions from the client device; Based on the answers to the first set of questions, it is determined that the item is a counterfeit. The question set is ranked based on the strength of the correlation between the answers to the first question set and whether the item is a counterfeit item; as well as In response to a request from a client device for a second item list of the items, a second set of questions is provided to the client device from a ranked set of questions.
15. The medium according to claim 14, further comprising: The counterfeit item indication weight associated with the first question set is modified based on the indication that the item is a counterfeit item. The counterfeit item indication weight indicates the strength of the correlation between the answers to the first question set and the item being a counterfeit item. The question set is ranked using the modified counterfeit item indication weight.
16. The medium according to claim 14, wherein, The second set of questions includes questions from the ranked set that were not included in the first set of questions.
17. The medium according to claim 14, wherein, The question set includes questions generated based on a natural language processing model that identifies item features of the item from text data with linguistic context related to counterfeit item detection.
18. The medium according to claim 14, further comprising: The first item list image included in the answers to the first question selection set is marked as a counterfeit; as well as Provide an image of the first list of tagged items as part of the training dataset used to train a machine learning model to identify counterfeit items.
19. The medium according to claim 18, further comprising: Receive the answers to the second set of questions, the answers including a second list of items image; as well as A machine learning model trained using the labeled first item list image is used to determine whether the second item list image includes the counterfeit item.