Automobile name matching method and device, electronic equipment and computer readable storage medium

By training a car name matching model on a heterogeneous platform, car names are mapped to the same semantic vector space, maintaining the hierarchical relationship between brand, model, and specification. This solves the problem of low accuracy in existing car name matching technologies and achieves high-precision car name matching.

CN122240822APending Publication Date: 2026-06-19SHANGHAI XULU INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI XULU INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from brand mismatch and vehicle series confusion when matching car names on heterogeneous platforms, resulting in low matching accuracy and difficulty in supporting high-precision vehicle source alignment and inventory linkage.

Method used

By training a car name matching model, car names are mapped to the same semantic vector space, maintaining the hierarchical semantic relationship between car brand name, model information and vehicle specification information. Similarity search is performed using pre-generated target vectors, and the learning process is optimized to distinguish between positive and negative sample car names in order to establish a semantic distribution pattern.

Benefits of technology

It improves the accuracy of car name matching on heterogeneous platforms, ensuring that the matching results are based on the inherent structural logic of the car name, and reducing false matching and missed matching.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240822A_ABST
    Figure CN122240822A_ABST
Patent Text Reader

Abstract

This invention proposes a vehicle name matching method, device, electronic device, and computer-readable storage medium. The vehicle name matching model, through optimization learning, can map any vehicle name to the same semantic vector space. In this semantic vector space, the relative importance of the vehicle brand name, model information, and vehicle specification information presents a hierarchical semantic relationship from strong to weak. That is, when converting a vehicle name into a vector, the vehicle name matching model prioritizes brand consistency, followed by model correspondence, and finally, specification similarity. Therefore, when using the vehicle name matching model to vectorize the query vehicle name from the original platform and retrieve the vector library of the target platform, the obtained similarity results are no longer driven by superficial character similarity or general semantic similarity, but rather by matching based on the inherent structural logic of the vehicle name, thus enabling accurate vehicle name matching across heterogeneous platforms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of automotive technology, and more specifically, to a method, apparatus, electronic device, and computer-readable storage medium for matching vehicle names. Background Technology

[0002] In automotive circulation and data collaboration, it is often necessary to match car names from platform A to the same car model on platform B. This requirement is called car name matching under heterogeneous platforms. Existing technologies mainly adopt two types of methods: one is vector semantic matching based on general text embedding, which treats the entire car name as a common short text, encodes it into a vector, and then calculates the cosine similarity; the other is BM25 class string matching based on word frequency and inverted index, which relies on character overlap scoring.

[0003] However, neither of the above two methods takes into account the inherent structure of car names. Therefore, in practical applications, brand mismatch, car series confusion, or ignoring key differences are likely to occur, ultimately resulting in a significantly low matching accuracy rate, making it difficult to support core business scenarios such as high-precision car source alignment and inventory linkage. Summary of the Invention

[0004] In view of this, the purpose of the present invention is to provide a vehicle name matching method, apparatus, electronic device and computer-readable storage medium that can improve the accuracy of vehicle name matching under heterogeneous platforms.

[0005] To achieve the above objectives, the technical solutions adopted in the embodiments of the present invention are as follows: In a first aspect, the present invention provides a method for matching vehicle names, the method comprising: The vehicle name to be queried is received from the source platform; The vehicle name to be queried is vectorized using a pre-trained vehicle name matching model to obtain the source vehicle name vector; the vehicle name matching model is configured to map vehicle names to the same semantic vector space, and maintain the hierarchical semantic relationship between vehicle brand name, model information and vehicle specification information in the semantic vector space. In the vector library built on the target platform, a target vector with a similarity to the source vehicle name vector that meets a preset threshold is found, and the target vehicle name corresponding to the target vector is obtained; the target vector is pre-generated based on the target vehicle name using a pre-trained vehicle name matching model.

[0006] In an optional implementation, the vehicle name matching model is trained in the following manner: Multiple training samples are generated based on the vehicle names of the source platform and the vehicle names of the target platform. Each training sample includes a vehicle name to be matched from the source platform, multiple positive sample vehicle names from the target platform, and negative sample vehicle names. The positive sample vehicle names are target platform vehicle names that have high semantic consistency with the vehicle name to be matched in terms of vehicle brand name, model information, and vehicle specification information. The negative sample vehicle names are target platform vehicle names that have low semantic consistency with the vehicle name to be matched in terms of vehicle brand name, model information, and vehicle specification information. The text embedding model to be trained is used to determine the first similarity and the second similarity between the vehicle name to be matched and the positive sample vehicle name and the negative sample vehicle name; The total loss value is determined based on the first similarity and the second similarity corresponding to each group of training samples; The parameters of the text embedding model are iteratively updated based on the total loss value to obtain the car name matching model.

[0007] In an optional implementation, the vehicle name includes the vehicle brand name, model information, and vehicle specification information; the generation of multiple sets of training samples based on the vehicle names from the source platform and the target platform includes: Select any vehicle name from the source platform as the vehicle name to be matched; The car names in the target platform that have the same car brand name as the car name to be matched are identified as the candidate car name set; Using a large language model, multiple positive sample car names are selected from the candidate car name set according to the first rule; the first rule is determined based on the importance of car brand name, model information, and vehicle specification information on the semantic similarity of car names. Using a large language model, multiple negative sample car names are determined from the target platform according to the second rule; the second rule is that the selected negative sample car names have the same brand as the car name to be matched but different model information, or are different in both car brand name and model information and have different vehicle specification information.

[0008] In an optional implementation, determining the first and second similarities between the vehicle name to be matched and the positive and negative sample vehicle names using the text embedding model to be trained includes: The vehicle names to be matched, the positive sample vehicle names, and the negative sample vehicle names are input into the text embedding model to be trained. Generate the matching vector of the vehicle name to be matched, the feature vector of each positive sample vehicle name, and the feature vector of each negative sample vehicle name; Calculate the similarity between the vector to be matched and the feature vector of each positive sample car name to obtain the first similarity for each positive sample car name; Calculate the similarity between the vector to be matched and the feature vectors of each negative sample car name to obtain the second similarity for each negative sample car name.

[0009] In an optional implementation, multiple positive sample car names in each training sample group are arranged according to their similarity to the car name to be matched, and multiple negative sample car names in each training sample group are arranged according to their similarity to the car name to be matched; the step of determining the total loss value based on the first similarity and the second similarity corresponding to each training sample group includes: Each pair of adjacent positive sample car names in a plurality of positive sample car names is determined as a positive sample car name pair, and the first loss value is determined based on the first similarity corresponding to each positive sample car name pair. Each pair of adjacent negative sample car names in a plurality of negative sample car names is identified as a negative sample car name pair, and the second loss value is determined based on the second similarity corresponding to each negative sample car name pair. The third loss value is determined based on the minimum first similarity and the maximum second similarity; The total loss value is determined based on the first loss value, the second loss value, and the third loss value.

[0010] In an optional implementation, determining the first loss value based on the first similarity corresponding to each positive sample vehicle name pair includes: Calculate the difference between the two first similarities corresponding to each pair of positive sample car names to obtain the first difference corresponding to each pair of positive sample car names; Calculate the difference between the preset first difference threshold and the first difference corresponding to each pair of positive sample car names to obtain the second difference corresponding to each pair of positive sample car names; The first loss value is determined based on the second difference corresponding to each pair of positive sample car names.

[0011] In an optional implementation, determining the second loss value based on the second similarity corresponding to each negative sample vehicle name pair includes: Calculate the difference between the two first similarities corresponding to each negative sample vehicle name pair to obtain the first difference corresponding to each negative sample vehicle name pair; Calculate the difference between the preset first difference threshold and the first difference corresponding to each negative sample car name pair to obtain the second difference corresponding to each negative sample car name pair. The second loss value is determined based on the second difference corresponding to each negative sample vehicle name pair.

[0012] Secondly, the present invention provides a vehicle name matching device, the device comprising: The response module is used to receive the vehicle name to be queried from the source platform; The processing module is used to vectorize the vehicle name to be queried using a pre-trained vehicle name matching model to obtain the source vehicle name vector; the vehicle name matching model is configured to map the vehicle name to the same semantic vector space, and maintain the hierarchical semantic relationship between the vehicle brand name, model information and vehicle specification information in the semantic vector space. The matching module is used to search for a target vector in the vector library built on the target platform that has a similarity to the source vehicle name vector that meets a preset threshold, and to obtain the target vehicle name corresponding to the target vector; the target vector is pre-generated based on the target vehicle name using a pre-trained vehicle name matching model.

[0013] Thirdly, the present invention provides an electronic device including a processor and a memory, wherein the memory stores a computer program executable by the processor, and the processor can execute the computer program to implement the vehicle name matching method described in any of the foregoing embodiments.

[0014] Fourthly, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the vehicle name matching method as described in any of the foregoing embodiments.

[0015] Compared to existing technologies, the vehicle name matching method, apparatus, electronic device, and computer-readable storage medium provided in this invention enable the vehicle name matching model to map any vehicle name to the same semantic vector space through optimization learning. In this semantic vector space, the relative importance of the vehicle brand name, model information, and vehicle specification information exhibits a hierarchical semantic relationship from strong to weak. In other words, when converting a vehicle name into a vector, the vehicle name matching model prioritizes brand consistency, followed by model correspondence, and only then considers specification similarity. Therefore, when using the vehicle name matching model to vectorize the query vehicle name from the original platform and retrieve the vector library of the target platform, the obtained similarity results are no longer driven by superficial character similarity or general semantic similarity, but rather by matching based on the inherent structural logic of the vehicle name, thus ensuring accurate vehicle name matching across heterogeneous platforms.

[0016] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0017] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This diagram illustrates a flowchart of a vehicle name matching method provided in an embodiment of the present invention.

[0019] Figure 2 This diagram illustrates another flowchart of the vehicle name matching method provided in an embodiment of the present invention.

[0020] Figure 3 A block diagram of a car name matching device provided in an embodiment of the present invention is shown.

[0021] Figure 4 A block diagram of an electronic device provided in an embodiment of the present invention is shown. Detailed Implementation

[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0023] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.

[0024] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0025] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0026] Please refer to Figure 1 , Figure 1 A schematic flowchart of a vehicle name matching method provided by an embodiment of the present invention is shown. The method includes the following steps: Step S200: Receive the vehicle name to be queried from the source platform.

[0027] It should be understood that in current business scenarios such as automotive e-commerce, price comparison platforms, and vehicle data integration, it is often necessary to find the same car model published on one platform (e.g., platform A) on another platform (e.g., platform B). However, different platforms have significantly different naming conventions for the same car model. For example, on domestic platforms, car names are generally broken down into the car brand name (i.e., dealer name) + model information (i.e., model series name) + vehicle specification information (e.g., model name), while on foreign platforms, the naming of the same car model varies greatly between different platforms.

[0028] For example, the car name is "Brand A, Seagull Series 2025 Smart Driving Edition 420KM Freedom Edition". Here, "Brand A" is the car brand name, "Seagull Series" is the model information, and "2025 Smart Driving Edition 420KM Freedom Edition" is the vehicle specification information.

[0029] In this embodiment of the invention, when a user searches for the same car model on another platform (e.g., platform B) using the car name (the car name to be queried) from a source platform (e.g., platform A), a car name matching requirement between heterogeneous platforms is generated. The car name to be queried is received in raw text form, such as "V brand X60 T6 3.0L AT ABD / AB HID 4WD5DR".

[0030] Step S210: The pre-trained car name matching model is used to vectorize the query car name to obtain the source car name vector. The car name matching model is configured to map the car name to the same semantic vector space and maintain the hierarchical semantic relationship between the car brand name, model information and vehicle specification information in the semantic vector space.

[0031] Next, the car name to be queried is input into the pre-trained car name matching model, and after vectorization, the source car name vector corresponding to the car name to be queried is obtained. It should be understood that the trained car name matching model does not simply convert this text (i.e., the car name to be queried) into a fixed-length vector, but actively identifies and strengthens three key components through its internal network structure: the car brand name is given the highest semantic weight; secondly, the vehicle model information is used to limit the basic type and generation of the vehicle; and the vehicle specification information is used as a low-level modifying feature.

[0032] This hierarchical modeling naturally separates car names from those of different brands, moderately separates car names from different series within the same brand, and brings car names with minor differences in configuration within the same series closer together, thus truly reflecting the product logic of "brand > series > specifications" in the automotive industry.

[0033] Step S220: In the vector library constructed by the target platform, find the target vector whose similarity to the source vehicle name vector meets the preset threshold, and obtain the target vehicle name corresponding to the target vector; the target vector is pre-generated based on the target vehicle name using a pre-trained vehicle name matching model.

[0034] During this process, after the source vehicle name vector is generated, at least one target vector that is closest to the source vehicle name vector (i.e., the similarity exceeds a preset threshold) is quickly retrieved (e.g., near nearest neighbor retrieval) in the vector library built by the target platform, and the corresponding target vehicle names are returned in order of similarity from high to low.

[0035] It should be noted that after the car name matching model is trained, the preprocessed set of all car names of the target platform (e.g., all 100,000 car names of platform B) is input into the trained car name matching model one by one to generate the corresponding target vector. The target vector and the corresponding target car name are then stored in the vector library (i.e., vector database, such as Milvus) in a one-to-one correspondence.

[0036] In summary, the car name matching method provided in this invention uses a car name matching model that, through optimization learning, can map any car name to the same semantic vector space. In this semantic vector space, the relative importance of the car brand name, model information, and vehicle specification information exhibits a hierarchical semantic relationship from strong to weak. That is, when converting a car name into a vector, the car name matching model prioritizes brand consistency, followed by model correspondence, and only then considers specification similarity. Therefore, when using the car name matching model to vectorize the query car name from the original platform and retrieve the vector library of the target platform, the obtained similarity results are no longer driven by superficial character similarity or general semantic similarity, but rather by matching based on the inherent structural logic of the car name, thus ensuring accurate car name matching across heterogeneous platforms.

[0037] Alternatively, regarding how to train the car name matching model, the following is one possible implementation. Please refer to... Figure 2 The car name matching method also includes the following steps: Step S100: Generate multiple sets of training samples based on the car names of the source platform and the car names of the target platform; each set of training samples includes the car name to be matched from the source platform, multiple positive sample car names and negative sample car names from the target platform; the positive sample car names are the target platform car names that have high semantic consistency with the car name to be matched in terms of car brand name, model information and vehicle specification information, and the negative sample car names are the target platform car names that have low semantic consistency with the car name to be matched in terms of car brand name, model information and vehicle specification information.

[0038] It should be understood that the technical problem to be solved by the embodiments of the present invention is that when matching car names between heterogeneous automotive information platforms, the general text embedding model simply treats the car name as an ordinary string and cannot identify its inherent structural hierarchy. This results in vehicles with the same brand but completely different models being incorrectly judged as highly similar, while real cars of the same model with the same brand and matching car series but with significantly different configuration descriptions are missed.

[0039] In this embodiment of the invention, by constructing training samples with clear semantic labels, the text full embedding model can directly converge to discrimination tasks with high semantic consistency and low semantic consistency during the training process, thereby naturally forming a semantic distribution pattern in the vector space that conforms to the conventions of the automotive industry.

[0040] First, based on the car brand name, model information, and vehicle specifications, the top N (e.g., the top 5) car names on the target platform that have the highest similarity to car names on the source platform are selected as positive sample car names. Similarly, car names on the target platform that have similarity to car names on the source platform in terms of car brand name, model information, or vehicle specifications are selected as negative sample car names. Both positive and negative sample car names are sorted according to their degree of similarity.

[0041] Each training sample consists of a car name to be matched, multiple positive car names, and multiple negative car names. Assume the training samples are... Where 's' is the car name to be matched; , , , and These are five positive sample car names, arranged in descending order of their similarity to the car name to be matched; , , , and These are five negative sample car names, arranged in descending order of their similarity to the car name to be matched.

[0042] Step S110: Use the text embedding model to be trained to determine the first similarity and second similarity between the car name to be matched and the positive sample car name and the negative sample car name.

[0043] The text embedding model to be trained is input with the car name to be matched, multiple positive car names, and multiple negative car names. The model outputs semantic vectors corresponding to the car name to be matched, each positive car name, and each negative car name. Then, the matching score between the semantic vectors is calculated. The matching score between the car name to be matched and the positive car names is the first similarity, and the matching score between the car name to be matched and the negative car names is the second similarity.

[0044] Step S120: Determine the total loss value based on the first similarity and second similarity corresponding to each group of training samples.

[0045] In this embodiment of the invention, the total loss value is used to comprehensively evaluate the modeling ability of the car name matching model for semantic discriminative relationships. Specifically, in the vector space, car names of different platforms corresponding to the same model of car (positive samples) should be close to each other, while car names of different models of car (negative samples) should be significantly far apart.

[0046] The total loss value quantifies whether the hierarchical semantic distribution formed by brand, model, and specification under the current model parameters conforms to the structural logic of automobile products by constraining the relative ranking consistency between positive samples, the relative ranking consistency between negative samples, and the minimum interval between positive and negative sample groups.

[0047] Step S130: Iteratively update the parameters of the text embedding model based on the total loss value to obtain the car name matching model.

[0048] Next, the parameters of the text embedding model are iteratively updated based on the total loss value until the model's output vector can stably distinguish between semantically similar and semantically dissimilar car name combinations, ultimately resulting in a car name matching model specifically designed for heterogeneous platform car name matching. For example, when the total loss value is less than a preset termination threshold (such as 10...), the model continues to refine the model. -5 Then, stop updating the parameters of the text embedding model to obtain the trained car name matching model.

[0049] As can be seen, the embodiments of the present invention generate multiple sets of training samples based on the car names of the source platform and the target platform, and clearly distinguish between positive sample car names with high semantic consistency and negative sample car names with low semantic consistency in each set of samples. Then, the text embedding model is used to calculate the corresponding similarity and optimize the model parameters accordingly. This enables the car name matching model to accurately capture the hierarchical semantic relationship between car brand name, model information and vehicle specification information. In the vectorization process, it can more reliably distinguish between car names that are substantially the same and those that are substantially different, thereby improving the accuracy of cross-platform car name matching and avoiding mismatches or missed matches caused by differences in naming habits.

[0050] Optionally, the vehicle name includes the vehicle brand name, model information, and vehicle specifications. Regarding how to generate training samples, one possible implementation is provided below. Figure 2The sub-steps of step S100 may include: Step S101: Select any car name from the source platform as the car name to be matched.

[0051] Step S102: Determine the car names on the target platform that have the same car brand name as the car name to be matched as the candidate car name set.

[0052] In this embodiment of the invention, car names whose brand names are identical to those of each car name to be matched are selected from the target platform, forming a candidate car name set. The differences in naming the same brand across different platforms are usually only reflected in inconsistencies in capitalization, or minor differences such as hyphens or spaces in a very few brands. These differences can be resolved by standardizing capitalization in the code; for individual brand correspondences that are difficult to process automatically, a mapping table is pre-established and calibrated manually.

[0053] Step S103: Using a large language model, select multiple positive sample car names from the candidate car name set according to the first rule; the first rule is determined based on the importance of the car brand name, model information, and vehicle specification information on the semantic similarity of the car name.

[0054] In this embodiment of the invention, the first rule is used as a prompt word and is input into the large language model along with the car name to be matched and the candidate car name set, instructing the large language model to select multiple positive sample car names corresponding to the car name to be matched from the candidate car name set according to the first rule.

[0055] As one possible implementation, the first rule includes four conditions: a, b, c, and d. Vehicle specification information includes, but is not limited to, powertrain, whether it is a sports model, and engine displacement. Condition a involves extracting vehicle model information from the names to be matched and searching for names in the candidate name set that contain the same vehicle model information. It should be noted that vehicle model information for the same model can exist in multiple forms; all instances of the same vehicle model are considered to represent the same vehicle model information.

[0056] Condition b is that if the vehicle name to be matched specifically indicates its powertrain type (e.g., pure electric, pure gasoline, hybrid), then the candidate vehicle name set will be searched for vehicles containing the same powertrain type. If the vehicle name to be matched does not specifically indicate its powertrain type, then condition b is not required.

[0057] Condition c states that if the car name to be matched is marked as sporty, then the system searches for a sporty car name in the candidate car name set. If the car name to be matched is not specifically marked as sporty, then condition c is not required. Here, "sporty" usually implies a higher-level configuration or a newer model.

[0058] Condition d is that if the engine displacement is indicated in the name of the car to be matched, then the set of candidate car names will be searched for car names containing the same engine displacement. Car names with different engine displacements are not considered to be the same model of car.

[0059] Among the above conditions, condition a is a fundamental requirement that must be met. The degree of influence of conditions b, c, and d on semantic similarity is ranked as d > c > b (i.e., engine displacement > whether it is a sporty type > power type). That is, the similarity between a car name that meets all three conditions (a, b, c, and d) and the car name to be matched is higher than that between a car name that meets only one of the three conditions (a, c, and d), the similarity between a car name that meets all three conditions (a, c, and d) and the car name to be matched is higher than that between a car name that meets only one of the three conditions (a, b, and d), and the similarity between a car name that meets all three conditions (a, b, and d) and the car name to be matched is higher than that between a car name that meets only one of the three conditions (a, b, and c).

[0060] Step S104: Using the large language model, determine multiple negative sample car names from the target platform according to the second rule; the second rule is that the selected negative sample car names have the same brand as the car name to be matched but different model information, or are different in both car brand name and model information and have different vehicle specification information.

[0061] In this embodiment of the invention, the second rule is used as a prompt word and is input into the large language model along with the car name to be matched and the car name set of the target platform (e.g., the car name database address), instructing the large language model to select multiple negative sample car names corresponding to the car name to be matched from the target platform according to the second rule.

[0062] As one possible implementation, the second rule includes two conditions. Condition 1 is to search for different model names of the same brand as the car name to be matched within the target platform's set of car names. For example, the S60 T2 of brand A and the S40 1.8A of brand A are not the same car.

[0063] Condition 2 is a vehicle name from the target platform's vehicle name set that has a different brand name and model information than the vehicle name to be matched, but contains at least one identical configuration in its vehicle specifications. For example, two vehicle names from different brands and models, but both with a 2.0L engine displacement.

[0064] The influence of conditions 1 and 2 on semantic similarity is ranked as follows: condition 1 > condition 2. Condition 2 determines the degree of similarity based on the number of matching configuration items in the vehicle specification information. The more matching configuration items, the higher the degree of similarity.

[0065] It should be noted that the order of the conditions in the first or second rule can be adjusted according to actual business needs, and the conditions in the first rule can also be adjusted according to actual business needs, such as adding engine type. This invention does not limit this.

[0066] It should be understood that if 10,000 sets of training samples are to be constructed, 10,000 car names are randomly selected from the source platform as car names to be matched, and for each car name to be matched, the corresponding positive sample car name and negative sample car name are selected from the target platform, thus obtaining 10,000 sets of training samples.

[0067] The number of positive and negative vehicle names can be set according to actual needs, such as 4 or 6, but the number of positive and negative vehicle names must be consistent. If not all positive vehicle names can be generated, they can be left blank, and the blank positions must be explicitly marked during training. Generally, negative vehicle names are easier to find.

[0068] Optionally, regarding how to determine the first similarity between each positive sample car name and the car name to be matched, and the second similarity between each negative sample car name and the car name to be matched, the following is a possible implementation method. Figure 2 The sub-steps of step S110 may include: Step S111: Input the car name to be matched, the positive sample car name, and the negative sample car name into the text embedding model to be trained.

[0069] Step S112: Generate the matching vector of the car name to be matched, the feature vector of each positive sample car name, and the feature vector of each negative sample car name.

[0070] In this embodiment of the invention, the text embedding model includes a representation module and a computation module. The representation module is used to vectorize the vehicle names to be matched, obtaining the matching vector. Similarly, the representation module is used to vectorize each positive sample vehicle name, obtaining the feature vector of each positive sample vehicle name; the representation module is also used to vectorize each negative sample vehicle name, obtaining the feature vector of each negative sample vehicle name.

[0071] Step S113: Calculate the similarity between the vector to be matched and the feature vectors of each positive sample car name to obtain the first similarity for each positive sample car name.

[0072] Step S114: Calculate the similarity between the vector to be matched and the feature vectors of each negative sample car name to obtain the second similarity corresponding to each negative sample car name.

[0073] In this embodiment of the invention, the similarity between the feature vector of each positive sample car name and the vector to be matched is calculated using a calculation module. For example, it can be calculated using a distance metric or a similarity metric. The distance metric can be Euclidean distance, and the similarity metric can be cosine similarity, inner product, etc.

[0074] Optionally, multiple positive sample car names in each training sample group are arranged according to their similarity to the car name to be matched, and multiple negative sample car names in each training sample group are arranged according to their similarity to the car name to be matched. Regarding how to determine the total loss value, one possible implementation is provided below. Figure 2 The sub-steps of step S120 may include: Step S121: Determine each pair of adjacent positive sample car names from the multiple positive sample car names as a positive sample car name pair, and determine the first loss value based on the first similarity corresponding to each positive sample car name pair.

[0075] In this embodiment of the invention, when generating each set of training samples, multiple positive sample car names are sorted according to their similarity to the car name to be matched, for example, in descending order. At the same time, multiple negative sample car names are also sorted according to their similarity to the car name to be matched, in descending order, and the positive sample car names are placed before the negative sample car names.

[0076] Assume the positive sample car name corresponding to the car name to be matched is , , , and And none of them are empty. Among them, and Form a positive sample car name pair. and Form a positive sample car name pair. and Form a positive sample car name pair. and A positive sample car name pair is formed, and a total of four positive sample car name pairs are generated.

[0077] Since each positive sample car name pair includes two positive sample car names, each positive sample car name pair corresponds to two first similarities. A first loss value is determined based on the two first similarities for each positive sample car name pair. This first loss value is used to constrain the car name matching model to make positive sample car names with higher similarity significantly superior to those with slightly lower similarity, thereby improving the car name matching model's ability to identify subtle differences between positive samples.

[0078] Step S122: Determine each pair of adjacent negative sample car names from the multiple negative sample car names as a negative sample car name pair, and determine the second loss value based on the second similarity corresponding to each negative sample car name pair.

[0079] Similarly, suppose the negative sample car name corresponding to the car name to be matched is , , , and .in, and Form a negative sample car name pair. and Form a negative sample car name pair. and Form a negative sample car name pair. and A negative sample car name pair is formed, and a total of four negative sample car name pairs are generated.

[0080] The second loss value is determined based on the two second similarities corresponding to each negative sample car name pair. The second loss value is used to constrain the car name matching model to distinguish between seemingly similar negative sample car names, thereby improving the car name matching model's ability to identify the risk of confusion between negative sample car names.

[0081] Step S123: Determine the third loss value based on the minimum first similarity and the maximum second similarity.

[0082] In this embodiment of the invention, the difference between the smallest first similarity among all positive sample car names and the largest second similarity among all negative sample car names is determined as the third difference. If the difference between the preset second difference threshold and the third difference is greater than 0, then the difference between the preset second difference threshold and the third difference is determined as the third loss value. If the difference between the preset second difference threshold and the third difference is not greater than 0, then the third loss value is set to 0.

[0083] Third loss value The calculation formula is:

[0084] in, This is the second gap threshold, representing the gap between groups; s is the car name to be matched. These are the positive sample car names with the lowest similarity to the car name to be matched (i.e., the smallest first similarity); It is the negative sample car name with the highest similarity to the car name to be matched (i.e., the largest second similarity); It is the smallest first similarity. It is the second highest similarity.

[0085] It should be understood that the third loss value is used to force the car name matching model to establish a sufficiently wide semantic safety margin between positive and negative samples, so as to improve the car name matching model's ability to construct a reliable boundary between positive and negative samples.

[0086] Step S124: Determine the total loss value based on the first loss value, the second loss value, and the third loss value.

[0087] In this embodiment of the invention, the loss function (i.e., the calculation formulas for the first loss value, the second loss value, and the third loss value) is a combined loss function based on pairwise hinge loss. The first loss value, the second loss value, and the third loss value are weighted and summed to obtain the total loss value. The parameters of the text embedding model are iteratively updated based on the total loss value, so that the final car name matching model fundamentally overcomes the mismatch problem caused by traditional embedding models ignoring the inherent semantic structure of car names, and significantly improves the business accuracy of cross-platform car name retrieval results.

[0088] Optionally, regarding how to determine the first loss value, the following is a possible implementation. The sub-steps of step S121 may include: Step S121-2: Calculate the difference between the two first similarities for each positive sample car name pair to obtain the first difference for each positive sample car name pair.

[0089] Step S121-3: Calculate the difference between the preset first difference threshold and the first difference corresponding to each positive sample car name pair to obtain the second difference corresponding to each positive sample car name pair.

[0090] Step S121-4: Determine the first loss value based on the second difference corresponding to each positive sample car name pair.

[0091] In this embodiment of the invention, the first loss value The calculation formula is:

[0092] Where N is the number of positive sample car names; The first gap threshold represents the gap within the group, and the first gap threshold is much smaller than the second gap threshold. It is the first similarity between the i-th positive sample car name and the car name to be matched; It is the first similarity between the (i+1)th positive sample car name and the car name to be matched.

[0093] Optionally, regarding how to determine the first loss value, the following is a possible implementation. The sub-steps of step S122 may include: Step S122-2: Calculate the difference between the two first similarities corresponding to each negative sample car name pair to obtain the first difference corresponding to each negative sample car name pair.

[0094] Step S122-3: Calculate the difference between the preset first difference threshold and the first difference corresponding to each negative sample car name pair to obtain the second difference corresponding to each negative sample car name pair.

[0095] Step S122-4: Determine the second loss value based on the second difference corresponding to each negative sample vehicle name pair.

[0096] In this embodiment of the invention, the second loss value The calculation formula is:

[0097] Where N is the number of negative sample car names; It is the second similarity between the i-th negative sample car name and the car name to be matched; It is the second similarity between the (i+1)th negative sample car name and the car name to be matched.

[0098] Based on the same inventive concept, the basic principle and technical effects of the car name matching device provided in this embodiment are the same as those in the above embodiments. For the sake of brevity, any parts not mentioned in this embodiment can be referred to the corresponding content in the above embodiments.

[0099] Please refer to Figure 3 , Figure 3 This is a block diagram of a car name matching device 400 provided in an embodiment of the present invention. The car name matching device 400 includes a response module 410, a processing module 420, and a matching module 430.

[0100] The response module 410 is used to receive the vehicle name to be queried from the source platform.

[0101] The processing module 420 is used to vectorize the query car name using a pre-trained car name matching model to obtain the source car name vector. The car name matching model is configured to map the car name to the same semantic vector space and maintain the hierarchical semantic relationship between the car brand name, model information and vehicle specification information in the semantic vector space.

[0102] The matching module 430 is used to search for target vectors in the vector library built on the target platform that have a similarity to the source vehicle name vector that meets a preset threshold, and to obtain the target vehicle name corresponding to the target vector; the target vector is pre-generated based on the target vehicle name using a pre-trained vehicle name matching model.

[0103] In summary, the car name matching device provided in this embodiment of the invention uses a car name matching model that, through optimization learning, can map any car name to the same semantic vector space. In this semantic vector space, the relative importance of the car brand name, model information, and vehicle specification information presents a hierarchical semantic relationship from strong to weak. That is, when converting a car name into a vector, the car name matching model prioritizes brand consistency, followed by model correspondence, and only then considers specification similarity. Therefore, when using the car name matching model to vectorize the query car name from the original platform and retrieve the vector library of the target platform, the obtained similarity results are no longer driven by superficial character similarity or general semantic similarity, but rather by matching based on the inherent structural logic of the car name, thus enabling accurate car name matching across heterogeneous platforms.

[0104] Optionally, the matching module 430 is also used to generate multiple sets of training samples based on the vehicle names of the source platform and the vehicle names of the target platform; each set of training samples includes the vehicle name to be matched from the source platform, multiple positive sample vehicle names and negative sample vehicle names from the target platform; the positive sample vehicle name is the target platform vehicle name that has high semantic consistency with the vehicle name to be matched in terms of vehicle brand name, model information and vehicle specification information, and the negative sample vehicle name is the target platform vehicle name that has low semantic consistency with the vehicle name to be matched in terms of vehicle brand name, model information and vehicle specification information.

[0105] The text embedding model to be trained is used to determine the first similarity and the second similarity between the car name to be matched and the positive and negative sample car names; the total loss value is determined based on the first and second similarities corresponding to each group of training samples; the parameters of the text embedding model are iteratively updated based on the total loss value to obtain the car name matching model.

[0106] Optionally, the car name includes the car brand name, model information, and vehicle specification information. The matching module 430 is specifically used to identify any car name from the source platform as the car name to be matched; to identify car names from the target platform with the same car brand name as the car name to be matched as a candidate car name set; and to select multiple positive sample car names from the candidate car name set using a large language model according to a first rule; the first rule is determined based on the importance of the car brand name, model information, and vehicle specification information to the semantic similarity of the car name.

[0107] Using a large language model, multiple negative sample car names to be matched are determined from the target platform according to the second rule. The second rule is that the selected negative sample car names have the same brand as the car name to be matched but different model information, or they are different in both car brand name and model information and have different vehicle specification information.

[0108] Optionally, the matching module 430 is specifically used to input the vehicle name to be matched, each positive sample vehicle name, and each negative sample vehicle name into the text embedding model to be trained; generate the matching vector of the vehicle name to be matched, the feature vector of each positive sample vehicle name, and the feature vector of each negative sample vehicle name; calculate the similarity between the matching vector and the feature vector of each positive sample vehicle name to obtain the first similarity corresponding to each positive sample vehicle name; and calculate the similarity between the matching vector and the feature vector of each negative sample vehicle name to obtain the second similarity corresponding to each negative sample vehicle name.

[0109] Optionally, multiple positive sample car names in each training sample are arranged according to their similarity to the car name to be matched, and multiple negative sample car names in each training sample are arranged according to their similarity to the car name to be matched.

[0110] The matching module 430 is specifically used to determine each pair of adjacent positive sample car names in a plurality of positive sample car names as a pair of positive sample car names, and to determine a first loss value based on the first similarity corresponding to each pair of positive sample car names; to determine each pair of adjacent negative sample car names in a plurality of negative sample car names as a pair of negative sample car names, and to determine a second loss value based on the second similarity corresponding to each pair of negative sample car names; to determine a third loss value based on the minimum first similarity and the maximum second similarity; and to determine a total loss value based on the first loss value, the second loss value and the third loss value.

[0111] Optionally, the matching module 430 is specifically used to calculate the difference between the two first similarities corresponding to each positive sample vehicle name pair, to obtain the first difference corresponding to each positive sample vehicle name pair; calculate the difference between the preset first gap threshold and the first difference corresponding to each positive sample vehicle name pair, to obtain the second difference corresponding to each positive sample vehicle name pair; and determine the first loss value based on the second difference corresponding to each positive sample vehicle name pair.

[0112] Optionally, the matching module 430 is specifically used to calculate the difference between the two first similarities corresponding to each negative sample vehicle name pair, to obtain the first difference corresponding to each negative sample vehicle name pair; calculate the difference between the preset first gap threshold and the first difference corresponding to each negative sample vehicle name pair, to obtain the second difference corresponding to each negative sample vehicle name pair; and determine the second loss value based on the second difference corresponding to each negative sample vehicle name pair.

[0113] Please refer to Figure 4 This is a block diagram illustrating an electronic device 500 provided in an embodiment of the present invention. The electronic device 500 includes, but is not limited to, a personal computer (PC), a personal digital assistant (PDA), a laptop computer, a tablet computer, and a server. The electronic device 500 includes a memory 510, a processor 520, and a communication module 530. The memory 510, processor 520, and communication module 530 are electrically connected directly or indirectly to each other to achieve data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines.

[0114] The memory 510 is used to store programs or data. The memory 510 may be, but is not limited to, random access memory, read-only memory, programmable read-only memory, erasable read-only memory, electrically erasable read-only memory, etc.

[0115] The processor 520 is used to read / write data or programs stored in the memory 510 and perform corresponding functions. For example, when a computer program stored in the memory 510 is executed by the processor 520, the car name matching method disclosed in the above embodiments can be implemented.

[0116] The communication module 530 is used to establish a communication connection between the electronic device 500 and other communication terminals via a network, and to send and receive data via the network.

[0117] It should be understood that, Figure 4 The structure shown is only a schematic diagram of the electronic device 500. The electronic device 500 may also include components that are larger than those shown. Figure 4 The more or fewer components shown, or having the same Figure 4 The different configurations shown. Figure 4 The components shown can be implemented using hardware, software, or a combination thereof.

[0118] This invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor 520, implements the vehicle name matching method disclosed in the above embodiments.

[0119] This invention also provides a program product that, when executed by processor 520, implements the vehicle name matching method disclosed in the above embodiments.

[0120] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative; for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of the invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0121] In addition, the functional modules in the various embodiments of the present invention can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.

[0122] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0123] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for matching car names, characterized in that, The method includes: The name of the vehicle to be queried is received from the source platform; The vehicle name to be queried is vectorized using a pre-trained vehicle name matching model to obtain the source vehicle name vector; the vehicle name matching model is configured to map vehicle names to the same semantic vector space, and maintain the hierarchical semantic relationship between vehicle brand name, model information and vehicle specification information in the semantic vector space. In the vector library built on the target platform, a target vector with a similarity to the source vehicle name vector that meets a preset threshold is found, and the target vehicle name corresponding to the target vector is obtained; the target vector is pre-generated based on the target vehicle name using a pre-trained vehicle name matching model.

2. The vehicle name matching method according to claim 1, characterized in that, The vehicle name matching model was trained in the following way: Multiple training samples are generated based on the vehicle names of the source platform and the vehicle names of the target platform. Each training sample includes a vehicle name to be matched from the source platform, multiple positive sample vehicle names from the target platform, and negative sample vehicle names. The positive sample vehicle names are target platform vehicle names that have high semantic consistency with the vehicle name to be matched in terms of vehicle brand name, model information, and vehicle specification information. The negative sample vehicle names are target platform vehicle names that have low semantic consistency with the vehicle name to be matched in terms of vehicle brand name, model information, and vehicle specification information. The text embedding model to be trained is used to determine the first similarity and the second similarity between the vehicle name to be matched and the positive sample vehicle name and the negative sample vehicle name; The total loss value is determined based on the first similarity and the second similarity corresponding to each group of training samples; The parameters of the text embedding model are iteratively updated based on the total loss value to obtain the car name matching model.

3. The vehicle name matching method according to claim 2, characterized in that, The vehicle name includes the vehicle brand name, model information, and vehicle specification information; the generation of multiple training samples based on the vehicle names from the source platform and the target platform includes: Select any vehicle name from the source platform as the vehicle name to be matched; The car names in the target platform that have the same car brand name as the car name to be matched are identified as the candidate car name set; Using a large language model, multiple positive sample car names are selected from the candidate car name set according to the first rule; the first rule is determined based on the importance of car brand name, model information, and vehicle specification information on the semantic similarity of car names. Using a large language model, multiple negative sample car names are determined from the target platform according to the second rule; the second rule is that the selected negative sample car names have the same brand as the car name to be matched but different model information, or are different in both car brand name and model information and have different vehicle specification information.

4. The vehicle name matching method according to claim 2, characterized in that, The step of determining the first and second similarities between the vehicle name to be matched and the positive and negative sample vehicle names using a text embedding model to be trained includes: The vehicle names to be matched, the positive sample vehicle names, and the negative sample vehicle names are input into the text embedding model to be trained. Generate the matching vector of the vehicle name to be matched, the feature vector of each positive sample vehicle name, and the feature vector of each negative sample vehicle name; Calculate the similarity between the vector to be matched and the feature vector of each positive sample car name to obtain the first similarity for each positive sample car name; Calculate the similarity between the vector to be matched and the feature vectors of each negative sample car name to obtain the second similarity for each negative sample car name.

5. The vehicle name matching method according to claim 2, characterized in that, In each training sample, multiple positive sample car names are arranged according to their similarity to the car name to be matched, and in each training sample, multiple negative sample car names are arranged according to their similarity to the car name to be matched. The step of determining the total loss value based on the first similarity and second similarity corresponding to each group of training samples includes: Each pair of adjacent positive sample car names in a plurality of positive sample car names is determined as a positive sample car name pair, and the first loss value is determined based on the first similarity corresponding to each positive sample car name pair. Each pair of adjacent negative sample car names in a plurality of negative sample car names is identified as a negative sample car name pair, and the second loss value is determined based on the second similarity corresponding to each negative sample car name pair. The third loss value is determined based on the minimum first similarity and the maximum second similarity; The total loss value is determined based on the first loss value, the second loss value, and the third loss value.

6. The vehicle name matching method according to claim 5, characterized in that, The step of determining the first loss value based on the first similarity corresponding to each positive sample vehicle name includes: Calculate the difference between the two first similarities corresponding to each pair of positive sample car names to obtain the first difference corresponding to each pair of positive sample car names; Calculate the difference between the preset first difference threshold and the first difference corresponding to each pair of positive sample car names to obtain the second difference corresponding to each pair of positive sample car names; The first loss value is determined based on the second difference corresponding to each pair of positive sample car names.

7. The vehicle name matching method according to claim 5, characterized in that, The step of determining the second loss value based on the second similarity corresponding to each negative sample vehicle name includes: Calculate the difference between the two first similarities corresponding to each negative sample vehicle name pair to obtain the first difference corresponding to each negative sample vehicle name pair; Calculate the difference between the preset first difference threshold and the first difference corresponding to each negative sample car name pair to obtain the second difference corresponding to each negative sample car name pair. The second loss value is determined based on the second difference corresponding to each negative sample vehicle name pair.

8. A vehicle name matching device, characterized in that, The device includes: The response module is used to receive the vehicle name to be queried from the source platform; The processing module is used to vectorize the vehicle name to be queried using a pre-trained vehicle name matching model to obtain the source vehicle name vector; the vehicle name matching model is configured to map the vehicle name to the same semantic vector space, and maintain the hierarchical semantic relationship between the vehicle brand name, model information and vehicle specification information in the semantic vector space. The matching module is used to search for a target vector in the vector library built on the target platform that has a similarity to the source vehicle name vector that meets a preset threshold, and to obtain the target vehicle name corresponding to the target vector; the target vector is pre-generated based on the target vehicle name using a pre-trained vehicle name matching model.

9. An electronic device, characterized in that, It includes a processor and a memory, the memory storing a computer program that can be executed by the processor to implement the vehicle name matching method according to any one of claims 1-7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the vehicle name matching method as described in any one of claims 1-7.