Fine-tuning method of large language model, false address identification method, device and equipment

By performing data augmentation and parameter tuning on the large language model, the adaptability and accuracy issues of existing fake address identification methods are resolved, and the model's ability to identify unknown or mutated addresses is improved.

CN119784440BActive Publication Date: 2026-06-16BEIJING WODONG TIANJUN INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING WODONG TIANJUN INFORMATION TECH CO LTD
Filing Date
2024-12-23
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing methods for identifying fake addresses rely on large amounts of labeled data, lack adaptability to unknown or mutated cheating patterns, and have insufficient contextual understanding and generalization capabilities, resulting in low identification accuracy.

Method used

By generating multiple initial address samples, performing data augmentation, training an initial low-rank adaptive model, and adjusting the self-attention layer parameters of the large language model, a fine-tuned model is obtained, enhancing its ability to understand the contextual relationships and complex semantics of address text.

🎯Benefits of technology

It improves the adaptability and insight accuracy of large language models in scenarios with scarce labeled samples, and achieves more accurate identification of fake addresses.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119784440B_ABST
    Figure CN119784440B_ABST
Patent Text Reader

Abstract

The present disclosure provides a large language model fine-tuning method, a false address identification method, an apparatus and a device, relating to the technical field of artificial intelligence. The method comprises: generating a plurality of initial address samples based on real addresses included in an order database; performing data enhancement processing on the plurality of initial address samples based on a plurality of abnormal patterns to obtain a plurality of target address samples; training an initial low-rank adaptive model using the plurality of target address samples to obtain a target low-rank adaptive model, wherein the initial low-rank adaptive model is initialized based on a self-attention layer of a large language model; and adjusting model parameters of the self-attention layer of the large language model based on model parameters of the target low-rank adaptive model to obtain a fine-tuned large language model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, and more specifically, to a method for fine-tuning a large language model, a method for identifying fake addresses, an apparatus, and a device. Background Technology

[0002] With the gradual development and popularization of mobile internet services, online shopping has become a consumption habit for most netizens. E-commerce affiliate marketing (CPS, Cost Per Sales) can leverage portal-level online media to showcase products across the entire website, thereby driving user awareness, preferences, and purchasing behavior. Therefore, CPS has become the preferred advertising method for e-commerce platforms. However, with the promotion and development of CPS, the use of fake addresses to fraudulently obtain advertising commissions has also increased, thereby raising the operational risks for e-commerce platforms.

[0003] In related technologies, rule engines or traditional machine learning models can be used to identify fake addresses. However, these methods are data-driven identification methods, meaning that their accuracy requires the support of a large amount of rules or labeled data. When faced with unknown types of fake addresses, the accuracy of using rule engines or traditional machine learning models to identify fake addresses is relatively low. Summary of the Invention

[0004] In view of this, this disclosure provides a method for fine-tuning a large language model, a method for identifying fake addresses, an apparatus, and a device.

[0005] One aspect of this disclosure provides a method for fine-tuning a large language model, comprising: generating multiple initial address samples based on real addresses included in an order database, wherein the initial address samples include address text and initial labels for the address text; performing data augmentation processing on the multiple initial address samples based on multiple anomaly patterns to obtain multiple target address samples; training an initial low-rank adaptive model using the multiple target address samples to obtain a target low-rank adaptive model, wherein the initial low-rank adaptive model is obtained by initializing a self-attention layer of the large language model; and adjusting the model parameters of the self-attention layer of the large language model based on the model parameters of the target low-rank adaptive model to obtain a fine-tuned large language model.

[0006] Another aspect of this disclosure provides a method for identifying fake addresses based on a large language model, comprising: in response to receiving a prompt text, extracting an address text to be identified from the prompt text if it is determined that the prompt words included in the prompt text match preset prompt words; performing multiple rounds of question-and-answer with a fine-tuned large language model based on a preset thought chain and the address text to be identified, to obtain a target answer text output by the fine-tuned large language model; and determining the fake address identification result of the address text to be identified based on the target answer text; wherein the fine-tuned large language model is obtained by fine-tuning the large language model using the fine-tuning method described above.

[0007] Another aspect of this disclosure provides a fine-tuning apparatus for a large language model, comprising: an initial address generation module for generating multiple initial address samples based on real addresses included in an order database, wherein the initial address samples include address text and initial labels for the address text; a first address generation module for performing data augmentation processing on the multiple initial address samples based on multiple anomaly patterns to obtain multiple target address samples; an initial model training module for training an initial low-rank adaptive model using the multiple target address samples to obtain a target low-rank adaptive model, wherein the initial low-rank adaptive model is obtained by initializing a self-attention layer of a large language model; and a model parameter adjustment module for adjusting the model parameters of the self-attention layer of the large language model based on the model parameters of the target low-rank adaptive model to obtain a fine-tuned large language model.

[0008] Another aspect of this disclosure provides a fake address identification device based on a large language model, comprising: an address text extraction module, configured to extract the address text to be identified from the prompt text in response to receiving prompt text, provided that the prompt words included in the prompt text match preset prompt words; an answer text output module, configured to perform multi-round question-and-answer with a fine-tuned large language model based on a preset thought chain and the address text to be identified, to obtain the target answer text output by the fine-tuned large language model; and a fake address identification module, configured to determine the fake address identification result of the address text to be identified based on the target answer text; wherein the fine-tuned large language model is obtained by fine-tuning the large language model using the fine-tuning method described above.

[0009] Another aspect of this disclosure provides an electronic device comprising: one or more processors; and a memory for storing one or more programs, wherein, when the one or more programs are executed by the one or more processors, the one or more processors cause the one or more processors to perform the method as described above.

[0010] Another aspect of this disclosure provides a computer-readable storage medium storing computer-executable instructions, which, when executed, are used to implement the method described above.

[0011] Another aspect of this disclosure provides a computer program product including computer-executable instructions that, when executed, implement the method described above.

[0012] According to embodiments of this disclosure, by augmenting real addresses with multiple anomaly patterns, the corresponding address text form and content of each real address under multiple anomaly patterns can be obtained. By employing data augmentation, the adaptability of the fine-tuning method of the large language model in scenarios with scarce labeled samples is enhanced. By training an initial low-rank adaptive model using multiple target address samples, a target low-rank adaptive model is obtained. Then, based on the model parameters of the target low-rank adaptive model, the model parameters of the self-attention layer of the aforementioned large language model are adjusted, enabling the fine-tuned large language model to possess the ability to understand the contextual association of address text, the ability to understand complex address semantics, and the ability to predict address text context. Therefore, it at least partially overcomes the technical problem of low accuracy in identifying fake addresses in existing fake address identification methods, thereby achieving the goal of improving the insight accuracy and generalization ability of the fine-tuned large language model, and achieving a more accurate technical effect in identifying fake addresses. Attached Figure Description

[0013] The above and other objects, features and advantages of this disclosure will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:

[0014] Figure 1 The illustration schematically shows an exemplary system architecture to which a fine-tuning method for a large language model and a spurious address identification method can be applied according to embodiments of the present disclosure;

[0015] Figure 2 A flowchart illustrating a method for fine-tuning a large language model according to an embodiment of the present disclosure is shown schematically.

[0016] Figure 3 A flowchart illustrating a spurious address identification method based on a large language model according to an embodiment of the present disclosure is shown.

[0017] Figure 4 The flowchart illustrating the training process of a finely tuned large language model according to an embodiment of the present disclosure is shown in the illustration.

[0018] Figure 5 A flowchart illustrating the application method of a finely tuned large language model according to an embodiment of the present disclosure is shown.

[0019] Figure 6 A block diagram of a fine-tuning device for a large language model according to an embodiment of the present disclosure is shown schematically;

[0020] Figure 7 A block diagram of a spurious address recognition device based on a large language model according to an embodiment of the present disclosure is illustrated schematically; and

[0021] Figure 8 A block diagram of an electronic device suitable for implementing a fine-tuning method for a large language model and a spurious address identification method according to embodiments of the present disclosure is illustrated. Detailed Implementation

[0022] The embodiments of the present disclosure will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the disclosure. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of the present disclosure for ease of explanation. However, it will be apparent that one or more embodiments may be practiced without these specific details. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concepts of the present disclosure.

[0023] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.

[0024] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.

[0025] When using expressions such as "at least one of A, B and C", they should generally be interpreted in accordance with the meaning that is commonly understood by those skilled in the art (e.g., "a system having at least one of A, B and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B and C, etc.).

[0026] In the embodiments disclosed herein, the collection, updating, analysis, processing, use, transmission, provision, disclosure, and storage of data (e.g., including but not limited to user personal information) comply with relevant laws and regulations, are used for legitimate purposes, and do not violate public order and good morals. In particular, necessary measures have been taken to prevent unauthorized access to user personal information data and to safeguard user personal information security, network security, and national security.

[0027] With the gradual development and popularization of mobile internet services, online shopping has become a consumption habit for most netizens. E-commerce affiliate marketing (CPS, Cost Per Sales) can leverage portal-level online media to showcase products across the entire website, thereby driving user awareness, preferences, and purchasing behavior. Therefore, CPS has become the preferred advertising method for e-commerce platforms. However, with the promotion and development of CPS, the use of fake addresses to fraudulently obtain advertising commissions has also increased, thereby raising the operational risks for e-commerce platforms.

[0028] Existing methods for identifying fake addresses primarily rely on rule engines and machine learning models. Rule engines detect abnormal orders by setting static rules, identifying fake addresses by analyzing frequently occurring special characters in historical addresses or anomalous words similar to historical abnormal addresses. While direct, this method suffers from the vulnerability of static rule design to novel cheating strategies and lacks adaptability to unknown or mutated cheating patterns. For example, checking whether the delivery address contains special characters identified by expert experience or is similar to historical order addresses helps identify fake addresses and their corresponding fake orders. Machine learning models train on labeled fake address datasets by manually annotating historical addresses, enabling them to distinguish between real and fake addresses. However, these models rely on large amounts of labeled data for training, while anomaly events in anomaly detection scenarios are very sparse, with significant positive-to-negative sample bias, often making unbiased training difficult in practical engineering. Furthermore, the model's generalization ability is limited by the distribution and quality of the training data, failing to effectively identify new or mutated cheating methods.

[0029] Current technologies for identifying fake addresses mainly suffer from the following common shortcomings:

[0030] (1) Reliance on a large number of real labeled samples: Both rule engines and traditional machine learning models require a large amount of historical data for training and updating. However, in the field of anomaly detection, the distribution of positive and negative samples is uneven, and it is challenging to build an unbiased training set. A large number of manual annotations may introduce annotation bias, affecting the model's recognition ability and practicality.

[0031] (2) Limited contextual understanding: Existing technologies mainly adopt feature-based training methods, which lack in-depth understanding of addresses and contextual association capabilities. Simple pattern matching methods cannot effectively handle cheating addresses containing abnormal words, and cannot capture subtle differences and complex cheating patterns in fake addresses.

[0032] (3) Insufficient generalization ability: Supervised machine learning methods and rule engines are limited by historical data and cannot effectively identify new or mutated cheating methods. As cheating methods continue to evolve, models and rule engines need to be updated regularly to maintain their effectiveness. Therefore, these methods have weak identification flexibility and severely insufficient model generalization ability when faced with ever-changing fake addresses.

[0033] Embodiments of this disclosure provide a method for fine-tuning a large language model, a method for identifying fake addresses, an apparatus, and a device. The method includes: generating multiple initial address samples based on real addresses included in an order database, wherein each initial address sample includes address text and initial labels for the address text; performing data augmentation processing on the multiple initial address samples based on multiple anomaly patterns to obtain multiple target address samples; training an initial low-rank adaptive model using the multiple target address samples to obtain a target low-rank adaptive model, wherein the initial low-rank adaptive model is obtained by initializing a self-attention layer of a large language model; and adjusting the model parameters of the self-attention layer of the large language model based on the model parameters of the target low-rank adaptive model to obtain a fine-tuned large language model.

[0034] Figure 1 This illustration schematically depicts an exemplary system architecture for applying a large language model fine-tuning method and a fake address identification method according to embodiments of this disclosure. It should be noted that... Figure 1 The examples shown are merely examples of system architectures that can be applied to the embodiments of this disclosure, in order to help those skilled in the art understand the technical content of this disclosure, but do not mean that the embodiments of this disclosure cannot be used in other devices, systems, environments or scenarios.

[0035] like Figure 1 As shown, the system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and / or wireless communication links, etc.

[0036] Users can use the first terminal device 101, the second terminal device 102, and the third terminal device 103 to interact with the server 105 via the network 104 to receive or send messages, etc. Various communication client applications can be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, and / or social media platform software, etc. (for example only).

[0037] The first terminal device 101, the second terminal device 102, and the third terminal device 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, laptops, and desktop computers.

[0038] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103 (this is just an example). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.

[0039] It should be noted that the large language model fine-tuning method and the fake address identification method provided in this disclosure embodiment can generally be executed by server 105. Correspondingly, the large language model fine-tuning system and the fake address identification system provided in this disclosure embodiment can generally be set up in server 105. The large language model fine-tuning method and the fake address identification method provided in this disclosure embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105. Correspondingly, the large language model fine-tuning system and the fake address identification system provided in this disclosure embodiment can also be set up in a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105. Alternatively, the large language model fine-tuning method and the fake address identification method provided in this embodiment of the disclosure can also be executed by the first terminal device 101, the second terminal device 102, or the third terminal device 103, or by other terminal devices different from the first terminal device 101, the second terminal device 102, or the third terminal device 103. Accordingly, the large language model fine-tuning system and the fake address identification system provided in this embodiment of the disclosure can also be set in the first terminal device 101, the second terminal device 102, or the third terminal device 103, or in other terminal devices different from the first terminal device 101, the second terminal device 102, or the third terminal device 103.

[0040] For example, the real address and prompt text included in the order database can be originally stored in any one of the first terminal device 101, the second terminal device 102, or the third terminal device 103 (e.g., the first terminal device 101, but not limited thereto), or stored on an external storage device and can be imported into the first terminal device 101. Then, the first terminal device 101 can locally execute the large language model fine-tuning method and the fake address identification method provided in the embodiments of this disclosure, or send the real address and prompt text included in the order database to other terminal devices, servers, or server clusters, and have the other terminal devices, servers, or server clusters that receive the real address and prompt text included in the order database execute the large language model fine-tuning method and the fake address identification method provided in the embodiments of this disclosure.

[0041] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.

[0042] Figure 2 A flowchart illustrating a method for fine-tuning a large language model according to an embodiment of the present disclosure is shown.

[0043] like Figure 2 As shown, the method includes operations S201 to S204.

[0044] In operation S201, based on the real addresses included in the order database, multiple initial address samples are generated, wherein the initial address samples include address text and initial labels for the address text.

[0045] According to embodiments of this disclosure, original address data can be obtained from the real addresses included in the order database of the CPS affiliate order system, ensuring data integrity and diversity. The original address dataset D is obtained through an SQL query.

[0046] (1)

[0047] According to embodiments of this disclosure, a portion of address information is randomly extracted from the original address dataset D, and this address information is used as address text to ensure the representativeness of the address text, thus extracting a sample set S:

[0048] (2)

[0049] Where S represents the sample set extracted, and n is the number of initial address samples.

[0050] According to embodiments of this disclosure, to ensure the model can distinguish between positive and negative examples, address text is labeled with positive and negative samples, forming positive and negative example datasets. The positive example dataset represents real and valid addresses, while the negative example dataset represents fake or invalid addresses. According to embodiments of this disclosure, initial address samples are generated using real addresses from the order database and labeled so that the initial address samples can reflect the characteristics of valid addresses, invalid addresses, and fake addresses among real addresses, and can also better reflect the correlation and differences between valid, invalid, and fake addresses.

[0051] In operation S202, data augmentation processing is performed on multiple initial address samples based on multiple abnormal modes to obtain multiple target address samples.

[0052] According to embodiments of this disclosure, various data augmentation processes can be performed on labeled samples, such as random character replacement, spelling error simulation, abnormal word simulation, and format transformation. Diverse training samples are generated through data augmentation processes, and then the format of the training samples is fine-tuned. The augmented data is then converted into a fine-tuned dialogue format, which can be used for domain fine-tuning of large language models.

[0053] According to embodiments of this disclosure, by performing data augmentation on the initial address samples, data augmentation is achieved for valid addresses, invalid addresses, and fake addresses in the real addresses. Through various data augmentation methods, different forms of valid addresses, invalid addresses, and fake addresses, as well as contextual semantic features of address text, can be generated. In subsequent model training, the model can learn a wider range of features of valid addresses, invalid addresses, and fake addresses. Furthermore, by using real addresses as training samples, the stability of the model can be improved.

[0054] In operation S203, an initial low-rank adaptive model is trained using multiple target address samples to obtain a target low-rank adaptive model. The initial low-rank adaptive model is obtained by initializing the self-attention layer of the large language model.

[0055] According to embodiments of this disclosure, multiple target address samples can be converted into the input representation of a large language model to obtain word segmentation and encoding of multiple target address samples. Then, LoRA technology is applied to adjust the model parameters of the initial low-rank adaptive model to obtain the target low-rank adaptive model.

[0056] According to embodiments of this disclosure, LoRA technology adjusts the weight matrix of the model parameters of the self-attention layer of a large language model by adding a target low-rank adaptive model, without directly modifying the original weights of the model parameters of the self-attention layer of the large language model. This can significantly reduce the number of parameters that need to be updated, reduce the computational and storage costs of fine-tuning the large language model, and is suitable for use in resource-constrained environments.

[0057] In operation S204, the model parameters of the self-attention layer of the large language model are adjusted based on the model parameters of the target low-rank adaptive model to obtain the fine-tuned large language model.

[0058] According to embodiments of this disclosure, the model parameters of a target low-rank adaptive model can be fused with the model parameters of a large language model, and the fused model parameters can be saved as the model parameters of a fine-tuned large language model. Parameter fusion can be implemented in various ways, such as linear weighted fusion methods and cross-fusion methods.

[0059] According to embodiments of this disclosure, by augmenting real addresses with multiple anomaly patterns, the corresponding address text form and content of each real address under multiple anomaly patterns can be obtained. By employing data augmentation, the adaptability of the fine-tuning method of the large language model in scenarios with scarce labeled samples is enhanced. By training an initial low-rank adaptive model using multiple target address samples, a target low-rank adaptive model is obtained. Then, based on the model parameters of the target low-rank adaptive model, the model parameters of the self-attention layer of the aforementioned large language model are adjusted, enabling the fine-tuned large language model to possess the ability to understand the contextual association of address text, the ability to understand complex address semantics, and the ability to predict address text context. Therefore, it at least partially overcomes the technical problem of low accuracy in identifying fake addresses in existing fake address identification methods, thereby achieving the goal of improving the insight accuracy and generalization ability of the fine-tuned large language model, and achieving a more accurate technical effect in identifying fake addresses.

[0060] According to embodiments of this disclosure, multiple initial address samples are subjected to data augmentation processing based on multiple anomaly patterns to obtain multiple target address samples. This includes: for each anomaly pattern, data augmentation is performed on the address text included in each of the multiple initial address samples based on a processing strategy corresponding to the anomaly pattern, resulting in multiple anomalous address texts corresponding to the anomaly pattern; target labels are determined for each of the multiple anomalous address texts based on the initial labels included in each of the multiple initial address samples and the anomaly pattern; and format conversion is performed based on the multiple anomalous address texts and their respective target labels to obtain multiple target address samples corresponding to the anomaly pattern.

[0061] According to embodiments of this disclosure, different anomaly patterns include text order adjustments in address text. For example, if the valid address is AB cell, it can be written as BA cell. The adjustment strategy can be to reverse the text order in the valid address of the real address. The initial label before augmentation of the initial address sample can be used as the target label for the anomaly address text after augmentation of the initial address sample. This setting method can improve the training efficiency of the model during subsequent model training.

[0062] According to embodiments of this disclosure, format conversion can be implemented in various ways. For example, full-width characters can be converted to half-width characters, and the target labels of multiple abnormal address texts and their respective targets can be converted to the same format. Alternatively, the text format can be converted to GBK or ASCII. To improve the accuracy of the model when using multiple target address samples for subsequent model training, the formats of the multiple target address samples should be consistent. Furthermore, sentence structure conversion can be performed according to actual needs; for example, the abnormal address text and target labels can be converted to a fine-tuned dialogue format.

[0063] According to embodiments of this disclosure, it is possible to simulate abnormal address scenarios such as fake addresses and invalid addresses under abnormal modes, enabling multiple target address samples to include a wider variety of abnormal address patterns. Therefore, when multiple target address samples are used as training samples, the model can learn the characteristics of fake and invalid addresses in more situations. Simultaneously, by standardizing the format of multiple target address samples, the accuracy of the trained model can be improved when multiple target address samples are subsequently used for model training. Furthermore, for scenarios with scarce labeled samples, data augmentation techniques are employed, significantly enhancing the model's ability to identify fake addresses. Data augmentation techniques simulate address input errors and possible fake address formats, enhancing the model's generalization ability in diverse scenarios. By expanding the coverage of training data and increasing sample diversity, the sample adaptability in scenarios with scarce labeled samples is effectively improved, assisting in the efficient and accurate fine-tuning of large language models.

[0064] According to embodiments of this disclosure, a format conversion is performed based on multiple abnormal address texts and their respective target tags to obtain multiple target address samples corresponding to the abnormal pattern, including: for each abnormal address text, embedding the abnormal address text into a first instruction template to obtain a question text; embedding the target tags of the abnormal address text into a second instruction template to obtain an answer text; and combining the question text and the answer text to obtain a target address sample.

[0065] According to embodiments of this disclosure, after data augmentation, when converting the data into a format suitable for fine-tuning a large language model, a fine-tuning training instruction format is designed based on the address, address discrimination result, and address anomaly words. For example, the address can be used as input, and the discrimination result and anomaly words as output. The augmented data is converted according to the designed instruction format to form a dataset for fine-tuning the dialogue format. A first instruction template can convert anomaly addresses into a preset format, for example, converting anomaly addresses into the form of province, city, district, street, neighborhood, and anomaly word 'a'; a second instruction template can convert the target label of anomaly address into a preset format, for example, converting the target label of anomaly address into the form of anomaly word 'a'.

[0066] According to embodiments of this disclosure, by employing fine-tuning of the dialogue format to convert the format of multiple abnormal address texts and their respective target tags, the answers can be better targeted at the key parts of the question, enhancing the relevance and effectiveness of the dialogue. The model trained using the training sample set of the fine-tuned dialogue format can better meet the user's needs based on different dialogue format preferences.

[0067] According to embodiments of this disclosure, multiple abnormal modes include character replacement mode, input error simulation mode, abnormal word simulation mode, format transformation simulation mode, and expression abnormality simulation mode; wherein, based on the processing strategy corresponding to the abnormal mode, data augmentation is performed on the address text included in each of the multiple initial address samples to obtain multiple abnormal address texts corresponding to the abnormal mode, including: for each initial address sample, processing the address text included in the initial address sample using at least one of the following methods to obtain abnormal address texts.

[0068] According to embodiments of this disclosure, optional methods include: Method 1, determining a first target character from multiple characters included in the address text, and replacing the first target character with a random character to obtain abnormal address text corresponding to a character replacement pattern. Method 2, determining a second target character from multiple characters included in the address text, simulating input errors based on the second target character to obtain an error character, and replacing the second target character with the error character to obtain abnormal address text corresponding to an input error simulation pattern. Method 3, determining a third target character from multiple characters included in the address text, deleting the third target character to obtain abnormal address text corresponding to an input error simulation pattern. Method 4, determining an insertion point in the address text, and inserting an abnormal word at the insertion point to obtain abnormal address text corresponding to an abnormal word simulation pattern. Method 5, determining a fourth target character from multiple characters included in the address text that has multiple expression formats, and performing format conversion on the fourth target character to obtain abnormal address text corresponding to a format conversion simulation pattern. Method Six: Using the language of the address text as the initial and target languages, determine the text translation link, and based on the text translation link, perform multiple translation processes on the address text to obtain the abnormal address text corresponding to the expression anomaly simulation mode.

[0069] According to embodiments of this disclosure, character replacement can be performed by changing the case of English characters in the address text, replacing Chinese characters with corresponding English characters, or replacing a specified character with other characters whenever it appears in the address text. These other characters can be the specified characters, random characters, or characters dynamically changing according to a preset rule. Furthermore, certain characters can be randomly replaced in the address string to simulate user input errors. For example, the probability p of each character being replaced can be set as follows:

[0070] (3)

[0071] Where A represents the original address, This represents the address after the character replacement.

[0072] According to embodiments of this disclosure, by replacing characters in the address text, more abnormal addresses can be obtained. These abnormal addresses can simulate situations such as incorrect address writing, missing characters in the address, reversed context, address errors caused by misoperation, or other factors that cause address anomalies, thereby enhancing the diversity of abnormal address samples.

[0073] According to embodiments of this disclosure, the input error simulation methods include various forms, such as text input error simulation and number input error simulation. Text input error simulation includes spelling error simulation, grammatical error simulation, and semantic error simulation. Number input error simulation can select number replacement, number omission, or number addition, etc., and can also introduce common spelling errors, such as reversed letter order or missing words. Error simulation can be implemented through a preset error pattern library, such as preset spelling error rules, where one spelling error corresponds to one error character.

[0074] (4)

[0075] in, This represents the address after simulating an input error.

[0076] According to embodiments of this disclosure, multiple target address samples obtained by data augmentation of address text using error simulation can simulate the input habits of different users and include a wider range of target address samples with different characteristics. When the model is subsequently trained using multiple target address samples, it helps to improve the stability and robustness of the model.

[0077] According to embodiments of this disclosure, when determining the insertion point in the address text, the insertion point can be chosen to be before or after the community name, or before the street name or the words "community". Alternatively, existing insertion point features of abnormal addresses can be extracted; for example, insertion point features can be commonly used insertion point locations or commonly used abnormal word categories. Common abnormal words such as "test test", "123456", and "sdfsf" can also be inserted into the address. Abnormal word insertion can be achieved by randomly selecting an insertion point in the address, or a pre-set abnormal word dictionary can be used.

[0078] (5)

[0079] Where W represents the abnormal word lexicon, This indicates the address after the abnormal word simulation.

[0080] According to embodiments of this disclosure, by applying anomaly word simulation mode to simulate the characteristics of anomalous words in actual anomalous addresses, the habit of setting anomalous words in address text can be more purposefully reproduced, making the anomalous address samples obtained by amplification of the normal word simulation mode closer to the real anomalous addresses. Applying multiple target address samples obtained by this amplification method to train the model helps to improve the accuracy of the model in identifying anomalous addresses.

[0081] According to embodiments of this disclosure, the format conversion simulation mode can convert the address text format to GBK or ASCII, etc. Changing the address format can also convert full-width characters to half-width characters, or represent the numbers in the address with Chinese characters. Format conversion can be achieved through regular expressions and replacement rules.

[0082] (6)

[0083] in, This represents the address after the format transformation simulation.

[0084] According to embodiments of this disclosure, by simulating fake addresses through format transformation, a variety of training samples of different types can be obtained. The model trained using these training samples can improve the ability to identify different types of fake addresses.

[0085] According to embodiments of this disclosure, the expression anomaly simulation mode can translate address text expressed in various languages ​​into the target language using language translation technology. For example, it can translate address text expressed in English into Chinese, or translate address text expressed in a mixture of Japanese, English, Korean, etc., into the target language.

[0086] Alternatively, the address text included in the initial address sample can be processed by combining the above methods to obtain abnormal address text, as shown in formulas (7) to (10):

[0087] (7)

[0088] (8)

[0089] (9)

[0090] (10)

[0091] in, It can represent the abnormal address text obtained after processing.

[0092] According to embodiments of this disclosure, the multiple target address sample data obtained by applying the expression anomaly simulation mode can characterize the situation where address text is described using multiple languages. By using the obtained multiple target address sample data to train the model, the model's ability to identify abnormal addresses represented by multiple languages ​​can be enhanced.

[0093] According to embodiments of this disclosure, the abnormal modes include character replacement mode, input error simulation mode, abnormal word simulation mode, format transformation simulation mode, and expression abnormality simulation mode. When performing data augmentation on the address text included in the initial address sample, one or a combination of multiple data augmentation methods included in the abnormal modes can be selected to finally obtain multiple target address samples.

[0094] According to embodiments of this disclosure, word embedding processing is performed on a target address sample to obtain a feature code for the target address sample, including: word segmentation processing of the text included in the target address sample to obtain multiple words; for each word, vector encoding of the word to obtain a first word code; obtaining a position code based on the position of the word in the text included in the target address sample; embedding the position code into the first word code to obtain a second word code; and combining the second word codes of the multiple words to obtain a feature code for the target address sample.

[0095] According to embodiments of this disclosure, before performing domain fine-tuning, the dataset needs to be converted into an input representation for a large language model. For example, the dataset can be converted into an input representation for a large language model through text segmentation, vectorization, or positional embedding. The word segmentation method can be the WordPiece segmentation method of GLM4.

[0096] (11)

[0097] Where T represents the word sequence after word segmentation.

[0098] Vectorization can be achieved using the model's embedding layer.

[0099] (12)

[0100] Where V represents the vector representation of a word.

[0101] Positional embeddings can use Rotary Embedding to add positional information to the embedding vector, which is an important step when processing sequence data.

[0102] According to embodiments of this disclosure, text segmentation can segment the text in a dialogue dataset, ensuring that each word can be recognized by the model. Vectorization representation converts the input word IDs into embedding vectors using a pre-trained pedestal large language model (GLM4), obtaining a vector representation for each word. Positional embedding representation allows the model to understand the position of words in the sequence. By converting the dataset into an input representation for the large language model, the recognition accuracy of the large language model can be improved.

[0103] According to embodiments of this disclosure, an initial low-rank adaptive model is trained using multiple target address samples to obtain a target low-rank adaptive model. This includes: performing word embedding processing on the target address samples to obtain feature codes of the target address samples; adding the initial low-rank adaptive model as a branch of the self-attention layer of a large language model to the large language model to obtain a large language model to be adjusted; and inputting the feature codes of the target address samples into the large language model to be adjusted, so as to adjust the model parameters of the initial low-rank adaptive model based on the forward propagation results of the feature codes of the target address samples in the large language model to be adjusted, thereby obtaining the target low-rank adaptive model.

[0104] According to embodiments of this disclosure, after data augmentation, the data needs to be converted into a data format suitable for fine-tuning a large language model. A fine-tuning training instruction format is designed based on the address, address discrimination results, and address anomaly words. For example, the address can be used as input, and the discrimination results and anomaly words as output. The instruction format can be represented as:

[0105] (13)

[0106] Where I represents the instruction set, R represents the discrimination result set, and W represents the abnormal word set.

[0107] The enhanced data is then converted according to the designed instruction format to form a dataset with a fine-tuned dialogue format.

[0108] According to embodiments of this disclosure, LoRA technology can be applied to fine-tuning of a large language model on multiple target address samples obtained after data augmentation. The fine-tuning process of the large language model can be carried out as follows:

[0109] First, initialize the LoRA matrix. Inject a trainable low-rank decomposition matrix into the self-attention mechanism layer of the GLM Block to obtain the fitting matrix, and then randomly initialize the fitting matrix.

[0110] (14)

[0111] Where B and A are low-rank matrices, A represents the first low-rank matrix and B represents the second low-rank matrix, W0 is the frozen pre-trained weight matrix, the rank r of the matrix is ​​determined, and the adaptation matrix is ​​randomly initialized:

[0112] (15)

[0113] Then, the vector representation of the input data target address sample is forward-propagated through the large language model to be adjusted, and the output of the large language model to be adjusted is calculated. Forward propagation can be implemented through the forward computation graph of the large language model to be adjusted.

[0114] (16)

[0115] Where O represents the output of the model.

[0116] Next, the loss function is calculated based on the output of the large language model to be adjusted and the true labels corresponding to the target address samples. The loss calculation formula is:

[0117] (17)

[0118] Where Y represents the true label.

[0119] Then, the gradient is calculated using the backpropagation algorithm, and the first low-rank matrix A and the second low-rank matrix B in the LoRA matrix are updated. Backpropagation is implemented using an automatic fine-tuning tool.

[0120] (18)

[0121] Finally, optimize the LoRA matrix. Repeat the forward propagation, loss calculation, and backpropagation process until the large language model to be adjusted converges. Use the Adam optimizer, as shown in the formula:

[0122] (19)

[0123] Where η represents the learning rate.

[0124] According to embodiments of this disclosure, the weights in the self-attention mechanism layer of the GLM4 large language model can be adjusted through fine-tuning, making it better suited for spoofing scenarios. For example, after fine-tuning training, the LoRA fine-tuning parameters are fused with the GLM4 model parameters. The fusion process includes parameter merging and model saving. Specifically, parameter merging involves merging the fine-tuned LoRA matrix parameters B and A with the weight matrix W0 of the pre-trained model to obtain a new weight matrix. The formula is:

[0125] (20)

[0126] According to embodiments of this disclosure, model saving involves saving the fused model parameters as a new model file, forming a large language model that possesses both general knowledge of large language models and domain knowledge of fake address discrimination, for subsequent use in fake address discrimination.

[0127] According to embodiments of this disclosure, the LoRA technique is used to fine-tune a pre-trained GLM4 model. By introducing a first low-rank matrix and a second low-rank matrix, while keeping the pre-trained weights unchanged, only a small number of parameters are fine-tuned using a single A100 GPU to adapt to a specific task. This reduces the number of training parameters while maintaining model performance. The fine-tuned model exhibits a 17% reduction in perplexity when dealing with fake address scenarios. This enables the model to quickly adapt to new fake address detection tasks, improving the understanding of complex address semantics and contextual prediction capabilities.

[0128] According to embodiments of this disclosure, adjusting the model parameters of the self-attention layer of a large language model based on the model parameters of the target low-rank adaptive model to obtain a fine-tuned large language model includes: performing vector multiplication on the first low-rank matrix and the second low-rank matrix included in the target low-rank adaptive model to obtain a parameter matrix; and adjusting the model parameters of the self-attention layer of the large language model based on the parameter matrix to obtain the fine-tuned large language model.

[0129] According to embodiments of this disclosure, after fine-tuning training, the LoRA fine-tuning parameters are fused with the GLM4 model parameters. The fusion process can be performed as follows: First, the first and second low-rank matrices of the fine-tuned LoRA matrix parameters are multiplied by a vector to obtain a parameter matrix. The parameter matrix is ​​then merged with the weight matrix of the pre-trained model to obtain a new weight matrix. The fused new weight matrix is ​​used as the model parameters of the fine-tuned large language model, resulting in a large language model that possesses both general knowledge of large language models and domain knowledge of fake address detection.

[0130] Figure 3 A flowchart illustrating a spurious address identification method based on a large language model according to an embodiment of the present disclosure is shown.

[0131] like Figure 3 As shown, the method includes operations S301 to S303.

[0132] In operation S301, in response to receiving a prompt text, if it is determined that the prompt words included in the prompt text match the preset prompt words, the address text to be identified is extracted from the prompt text.

[0133] In operation S302, based on the preset thought chain and the address text to be identified, multiple rounds of question-and-answer are conducted with the fine-tuned large language model to obtain the target answer text output by the fine-tuned large language model.

[0134] In operation S303, based on the target answer text, the false address identification result of the address text to be identified is determined; wherein, the fine-tuned large language model is obtained by fine-tuning the large language model using the fine-tuning method of the large language model described above.

[0135] According to embodiments of this disclosure, when identifying fake addresses, a trained GLM4 model can be combined with prompt word engineering. This involves designing appropriate fake address prompt words, such as "Is this a valid address?" or "Please check if the following address is valid." The prompt words can be designed based on business requirements and the model's understanding capabilities. A thought chain-based guidance method can also be used to help the model understand the task, guiding the model to judge fake addresses and identify abnormal words through progressively guiding prompt words. Based on the prompt word engineering, the finely tuned GLM4 model infers the delivery address.

[0136] For example, the delivery address is used as the address text to be identified, and the delivery address and prompt words are used as input to the model. The delivery address and prompt words are preprocessed and vectorized. The processed delivery address and prompt words are then used to infer from a fine-tuned GLM4 model to obtain the fake address identification result. The inference process can be implemented through the model's forward propagation. The output includes the fake address identification result, semantic logical structure explanation, and anomalous words to help users understand the model's judgment basis. The output may include the specific content of the anomalous words of the fake address or the "normal" judgment.

[0137] According to embodiments of this disclosure, by designing specific prompts and thought chains, the model can gain a deeper understanding of task requirements and effectively identify fake addresses, achieving an address recognition accuracy of 99%. During the inference phase, the model combines the preprocessed address and prompts to make judgments, outputting fake address identification results and abnormal words, achieving a fake address and abnormal word accuracy of 98%. This method, utilizing natural language processing technology, guides the model's decision-making process towards specific task objectives, thereby improving the insight accuracy and generalization ability of the fake address detection model, as well as its accuracy in identifying complex and concealed fake addresses.

[0138] Figure 4 The flowchart illustrating the training process of a finely tuned large language model according to an embodiment of the present disclosure is shown.

[0139] like Figure 4 As shown, the method includes operations S401 to S407.

[0140] In operation S401, a preset number of raw data for order addresses are retrieved from the CPS Alliance Business Data database.

[0141] In operation S402, the raw data of a preset number of order addresses is labeled with positive and negative samples to obtain multiple initial address samples. The initial address samples include positive samples with normal address labels and negative samples consisting of fake addresses containing abnormal words.

[0142] In operation S403, various data augmentation methods are applied to augment the initial address sample; these methods include character replacement, input error simulation, anomalous word simulation, format change, and expression anomalous simulation.

[0143] In operation S404, the format conversion of the dataset is fine-tuned for the initial address sample after data augmentation.

[0144] In operation S405, the initial address sample after format conversion is vectorized to obtain multiple target address samples.

[0145] In operation S406, an initial low-rank adaptive model is trained using multiple target address samples to obtain a target low-rank adaptive model. The initial low-rank adaptive model is obtained by initializing the self-attention layer of the large language model.

[0146] In operation S407, the model parameters of the self-attention layer of the large language model are adjusted based on the model parameters of the target low-rank adaptive model to obtain the fine-tuned large language model.

[0147] Figure 5 A flowchart illustrating the application method of a finely tuned large language model according to an embodiment of the present disclosure is shown.

[0148] like Figure 5 As shown, the method includes operations S501~S503.

[0149] When operating S501, design reasonable fake address prompts.

[0150] When operating the S502, the mind chain guidance method helps the fine-tuned GLM4 model understand the task.

[0151] According to an embodiment of this disclosure, taking an input address as an example, in the design of prompts and the guidance of thought chains, the fine-tuned GLM4 model considers the following two steps based on contextual understanding: 1. Address standardization: Based on the input address, consider a standardized delivery address, ensuring that the address includes the province, city, and district levels and does not contain internal codes or anomalous words that may be coded. 2. Anomalous code identification: Check the difference between the original address and the standardized delivery address. If the difference is a meaningless letter or anomalous word, it represents an anomalous code. Through the above two steps, please determine whether the address contains an anomalous code. If yes, please return "the specific content of the anomalous code"; if not, please return "normal".

[0152] When operating S503, the finely tuned GLM4 model is used to infer the delivery address and output the result of identifying a fake address.

[0153] Figure 6 A block diagram of a fine-tuning device for a large language model according to an embodiment of the present disclosure is shown schematically.

[0154] like Figure 6 As shown, the fine-tuning device 600 for the large language model includes an initial address generation module 601, a first address generation module 602, an initial model training module 603, and a model parameter adjustment module 604.

[0155] The initial address generation module 601 is used to generate multiple initial address samples based on the real addresses included in the order database. The initial address samples include address text and initial tags for the address text.

[0156] The first address generation module 602 is used to perform data augmentation processing on multiple initial address samples based on multiple abnormal modes to obtain multiple target address samples.

[0157] The initial model training module 603 is used to train an initial low-rank adaptive model using multiple target address samples to obtain a target low-rank adaptive model. The initial low-rank adaptive model is obtained by initializing the self-attention layer of the large language model.

[0158] The model parameter adjustment module 604 is used to adjust the model parameters of the self-attention layer of the large language model based on the model parameters of the target low-rank adaptive model, so as to obtain the fine-tuned large language model.

[0159] According to embodiments of this disclosure, the first address generation module 602 includes a data augmentation submodule, a label determination submodule, a text format conversion submodule, and a sample determination submodule.

[0160] The data augmentation submodule is used to augment the address text included in each of the multiple initial address samples for each anomaly pattern, based on the processing strategy corresponding to the anomaly pattern, to obtain multiple anomaly address texts corresponding to the anomaly pattern.

[0161] The label determination submodule is used to determine the target label for each of the multiple abnormal address texts based on the initial labels and abnormal patterns included in the multiple initial address samples.

[0162] The text format conversion submodule is used to perform format conversion based on multiple abnormal address texts and their respective target tags to obtain multiple target address samples corresponding to the abnormal patterns.

[0163] According to embodiments of this disclosure, the text format conversion submodule includes an address text embedding unit, a target tag embedding unit, and a text combination unit.

[0164] The address text embedding unit is used to embed the abnormal address text into the first instruction template for each abnormal address text to obtain the problem text.

[0165] The target tag embedding unit is used to embed the target tag of the exception address text into the second instruction template to obtain the answer text.

[0166] The text combination unit is used to combine the question text and the answer text to obtain the target address sample.

[0167] According to embodiments of this disclosure, the data augmentation submodule includes a character replacement unit, a replacement error simulation unit, a deletion error simulation unit, an abnormal word simulation unit, a format transformation simulation unit, and a representation error simulation unit.

[0168] The character replacement unit is used to determine the first target character from multiple characters included in the address text, and to replace the first target character with a random character to obtain the abnormal address text corresponding to the character replacement pattern.

[0169] The error replacement simulation unit is used to determine the second target character from multiple characters included in the address text, simulate input errors based on the second target character to obtain the error character, and use the error character to replace the second target character to obtain the abnormal address text corresponding to the input error simulation mode.

[0170] The deletion error simulation unit is used to determine a third target character from multiple characters included in the address text, delete the third target character, and obtain the abnormal address text corresponding to the input error simulation mode.

[0171] The abnormal word simulation unit is used to determine the insertion point in the address text and insert abnormal words at the insertion point to obtain abnormal address text corresponding to the abnormal word simulation mode. The format transformation simulation unit is used to determine the fourth target character with multiple expression formats from the multiple characters included in the address text, and perform format transformation on the fourth target character to obtain abnormal address text corresponding to the format transformation simulation mode.

[0172] The expression anomaly simulation unit is used to determine the text translation link by using the language of the address text as the initial language and the target language, and to perform multiple translation processes on the address text based on the text translation link to obtain the abnormal address text corresponding to the expression anomaly simulation mode.

[0173] According to embodiments of this disclosure, the initial model training module 603 includes a feature encoding determination submodule, a model addition submodule, and a model parameter adjustment submodule.

[0174] The feature encoding determination submodule is used to perform word embedding processing on the target address sample to obtain the feature encoding of the target address sample.

[0175] A submodule is added to the model to take the initial low-rank adaptive model as a branch of the self-attention layer of the large language model and add it to the large language model to be adjusted.

[0176] The model parameter adjustment submodule is used to input the feature encoding of the target address sample into the large language model to be adjusted, and to adjust the model parameters of the initial low-rank adaptive model based on the forward propagation result of the feature encoding of the target address sample in the large language model to be adjusted, so as to obtain the target low-rank adaptive model.

[0177] According to embodiments of this disclosure, the feature encoding determination submodule includes a text segmentation unit, a vector encoding unit, a position encoding unit, a position information embedding unit, and a first address encoding unit.

[0178] The text segmentation unit is used to segment the text included in the target address sample into multiple words.

[0179] The vector coding unit is used to perform vector coding on each word to obtain the first word code.

[0180] The location encoding unit is used to obtain the location code based on the position of the word in the text included in the target address sample.

[0181] The location information embedding unit is used to embed the location code into the first word code to obtain the target word code.

[0182] The first address encoding unit is used to combine the second word encodings of multiple words to obtain the feature encoding of the target address sample.

[0183] According to embodiments of this disclosure, the model parameter adjustment module 604 includes a parameter matrix determination submodule and a parameter adjustment submodule.

[0184] The parameter matrix determination submodule is used to perform vector multiplication on the first and second low-rank matrices of the target low-rank adaptive model to obtain the parameter matrix.

[0185] The parameter tuning submodule is used to adjust the model parameters of the self-attention layer of the large language model based on the parameter matrix to obtain a fine-tuned large language model.

[0186] Figure 7 A block diagram of a spurious address identification device based on a large language model according to an embodiment of the present disclosure is shown schematically.

[0187] like Figure 7 As shown, the fake address recognition device 700 based on a large language model includes an address text extraction module 701, an answer text output module 702, and a fake address recognition module 703.

[0188] Address text extraction module 701 is used to extract the address text to be identified from the prompt text in response to receiving prompt text, provided that the prompt words included in the prompt text match the preset prompt words.

[0189] The answer text output module 702 is used to perform multiple rounds of question-and-answer with the fine-tuned large language model based on the preset thought chain and the address text to be identified, so as to obtain the target answer text output by the fine-tuned large language model.

[0190] The fake address identification module 703 is used to determine the fake address identification result of the address text to be identified based on the target answer text; wherein, the fine-tuned large language model is obtained by fine-tuning the large language model using the fine-tuning method of the large language model described above.

[0191] Any one or more of the modules, submodules, units, and subunits according to embodiments of the present disclosure, or at least part of the functions of any one or more of them, can be implemented in one module. Any one or more of the modules, submodules, units, and subunits according to embodiments of the present disclosure can be implemented by dividing them into multiple modules. Any one or more of the modules, submodules, units, and subunits according to embodiments of the present disclosure can be at least partially implemented as hardware circuitry, such as a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a System-on-Chip, a System-on-a-Substrate, a System-on-Package, an Application-Specific Integrated Circuit (ASIC), or implemented in hardware or firmware by any other reasonable means of integrating or packaging circuitry, or implemented in software, hardware, or firmware, or in any suitable combination of any of these three implementation methods. Alternatively, one or more of the modules, submodules, units, and subunits according to embodiments of the present disclosure can be at least partially implemented as computer program modules, which, when run, can perform corresponding functions.

[0192] For example, any and multiple modules among the initial address generation module 601, first address generation module 602, initial model training module 603, model parameter adjustment module 604, address text extraction module 701, answer text output module 702, and fake address identification module 703 can be combined into one module / unit / subunit, or any one of these modules / units / subunits can be split into multiple modules / units / subunits. Alternatively, at least some of the functions of one or more of these modules / units / subunits can be combined with at least some of the functions of other modules / units / subunits and implemented in one module / unit / subunit. According to embodiments of this disclosure, at least one of the initial address generation module 601, the first address generation module 602, the initial model training module 603, the model parameter adjustment module 604, the address text extraction module 701, the answer text output module 702, and the fake address identification module 703 can be at least partially implemented as hardware circuits, such as field-programmable gate arrays (FPGAs), programmable logic arrays (PLAs), systems-on-a-chip, systems-on-a-substrate, systems-on-package, application-specific integrated circuits (ASICs), or any other reasonable means of integrating or packaging circuits, or implemented in software, hardware, or firmware, or in any appropriate combination of any of these three implementation methods. Alternatively, at least one of the initial address generation module 601, the first address generation module 602, the initial model training module 603, the model parameter adjustment module 604, the address text extraction module 701, the answer text output module 702, and the fake address identification module 703 can be at least partially implemented as computer program modules, which can perform corresponding functions when the computer program module is run.

[0193] It should be noted that the data processing system part in the embodiments of this disclosure corresponds to the data processing method part in the embodiments of this disclosure. The specific description of the data processing system part is referred to in the data processing method part, and will not be repeated here.

[0194] Figure 8 A block diagram of an electronic device suitable for implementing a fine-tuning method for a large language model and a spurious address identification method according to embodiments of the present disclosure is illustrated. Figure 8 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments disclosed herein.

[0195] like Figure 8 As shown, an electronic device 800 according to an embodiment of this disclosure includes a processor 801, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 802 or a program loaded from a storage portion 808 into a random access memory (RAM) 803. The processor 801 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of this disclosure.

[0196] RAM 803 stores various programs and data required for the operation of electronic device 800. Processor 801, ROM 802, and RAM 803 are interconnected via bus 804. Processor 801 performs various operations of the method flow according to embodiments of the present disclosure by executing programs in ROM 802 and / or RAM 803. It should be noted that the programs may also be stored in one or more memories other than ROM 802 and RAM 803. Processor 801 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in said one or more memories.

[0197] According to embodiments of this disclosure, the electronic device 800 may further include an input / output (I / O) interface 805, which is also connected to a bus 804. The electronic device 800 may also include one or more of the following components connected to the input / output (I / O) interface 805: an input section 806 including a keyboard, mouse, etc.; an output section 807 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 808 including a hard disk, etc.; and a communication section 809 including a network interface card such as a LAN card, modem, etc. The communication section 809 performs communication processing via a network such as the Internet. A drive 810 is also connected to the input / output (I / O) interface 805 as needed. A removable medium 811, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 810 as needed so that computer programs read from it can be installed into the storage section 808 as needed.

[0198] According to embodiments of this disclosure, the method flow according to embodiments of this disclosure can be implemented as a computer software program. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the methods shown in the flowchart. In such embodiments, the computer program can be downloaded and installed from a network via communication section 809, and / or installed from removable medium 811. When the computer program is executed by processor 801, it performs the functions defined in the system of embodiments of this disclosure. According to embodiments of this disclosure, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0199] This disclosure also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs that, when executed, implement the method according to the embodiments of this disclosure.

[0200] According to embodiments of this disclosure, the computer-readable storage medium can be a non-volatile computer-readable storage medium. Examples include, but are not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this disclosure, the computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0201] For example, according to embodiments of this disclosure, a computer-readable storage medium may include the ROM 802 and / or RAM 803 described above and / or one or more memories other than ROM 802 and RAM 803.

[0202] Embodiments of this disclosure also include a computer program product comprising a computer program containing program code for performing the methods provided in the embodiments of this disclosure. When the computer program product is run on an electronic device, the program code enables the electronic device to implement the large language model fine-tuning method and the large language model-based spurious address identification method provided in the embodiments of this disclosure.

[0203] When the computer program is executed by the processor 801, it performs the functions defined in the system / apparatus of this disclosure embodiments. According to embodiments of this disclosure, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.

[0204] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and may be downloaded and installed via the communication section 809, and / or installed from a removable medium 811. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.

[0205] According to embodiments of this disclosure, program code for executing the computer programs provided in embodiments of this disclosure can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages ​​include, but are not limited to, languages ​​such as Java, C++, Python, "C", or similar programming languages. The program code can execute entirely on a user's computing device, partially on a user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0206] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions. Those skilled in the art will understand that the features described in the various embodiments of the present disclosure can be combined and / or combined in various ways, even if such combinations are not explicitly described in the present disclosure. In particular, the features described in the various embodiments of this disclosure may be combined and / or combined in various ways without departing from the spirit and teachings of this disclosure. All such combinations and / or combinations fall within the scope of this disclosure.

[0207] The embodiments of this disclosure have been described above. However, these embodiments are for illustrative purposes only and are not intended to limit the scope of this disclosure. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of this disclosure, and all such substitutions and modifications should fall within the scope of this disclosure.

Claims

1. A method for fine-tuning a large language model, comprising: Based on the real addresses included in the order database, multiple initial address samples are generated, wherein the initial address samples include address text and initial tags for the address text; Data augmentation processing is performed on the multiple initial address samples based on multiple anomaly patterns to obtain multiple target address samples; An initial low-rank adaptive model is trained using the multiple target address samples to obtain a target low-rank adaptive model. The initial low-rank adaptive model is obtained by initializing the self-attention layer of the large language model. The initial low-rank adaptive model includes a first initial low-rank matrix and a second initial low-rank matrix. The dimension of the matrix obtained by multiplying the first and second initial low-rank matrices is consistent with the dimension of the model parameters of the self-attention layer of the large language model. The model parameters of the self-attention layer of the large language model are adjusted based on the model parameters of the target low-rank adaptive model to obtain the fine-tuned large language model. This includes: performing vector multiplication on the first low-rank matrix and the second low-rank matrix included in the target low-rank adaptive model to obtain a parameter matrix; and adding the parameter matrix to the model parameters of the self-attention layer of the large language model to obtain the fine-tuned large language model.

2. The method according to claim 1, wherein, The process involves performing data augmentation on the multiple initial address samples based on multiple anomaly patterns to obtain multiple target address samples, including: For each abnormal pattern, based on the processing strategy corresponding to the abnormal pattern, data augmentation is performed on the address text included in each of the multiple initial address samples to obtain multiple abnormal address texts corresponding to the abnormal pattern. Based on the initial tags included in each of the multiple initial address samples and the anomaly pattern, the target tags of each of the multiple anomaly address texts are determined; and Based on the multiple abnormal address texts and their respective target tags, a format conversion is performed to obtain multiple target address samples corresponding to the abnormal pattern.

3. The method according to claim 2, wherein, The step of performing format conversion based on the plurality of abnormal address texts and their respective target tags to obtain a plurality of target address samples corresponding to the abnormal pattern includes: For each abnormal address text, the abnormal address text is embedded into the first instruction template to obtain the problem text; The target tag corresponding to the abnormal address text is embedded into the second instruction template to obtain the answer text; and The question text and the answer text are combined to obtain the target address sample.

4. The method according to claim 2, wherein, The multiple abnormal modes include character replacement mode, input error simulation mode, abnormal word simulation mode, format transformation simulation mode, and expression abnormality simulation mode; Specifically, based on the processing strategy corresponding to the anomaly pattern, data augmentation is performed on the address text included in each of the multiple initial address samples to obtain multiple abnormal address texts corresponding to the anomaly pattern, including: For each initial address sample, the address text included in the initial address sample is processed using at least one of the following methods to obtain the abnormal address text: A first target character is determined from the multiple characters included in the address text, and the first target character is replaced by a random character to obtain an abnormal address text corresponding to the character replacement pattern; A second target character is determined from the multiple characters included in the address text. An input error simulation is performed based on the second target character to obtain an error character. The error character is then used to replace the second target character to obtain an abnormal address text corresponding to the input error simulation mode. A third target character is determined from the multiple characters included in the address text, and the third target character is deleted to obtain the abnormal address text corresponding to the input error simulation mode; An insertion point is determined in the address text, and an abnormal word is inserted at the insertion point to obtain an abnormal address text corresponding to the abnormal word simulation pattern; From the multiple characters included in the address text, a fourth target character with multiple expression formats is determined, and the format of the fourth target character is converted to obtain the abnormal address text corresponding to the format transformation simulation mode; Using the language of the address text as the initial and target languages, a text translation link is determined, and based on the text translation link, the address text is translated multiple times to obtain the abnormal address text corresponding to the expression anomaly simulation mode.

5. The method according to claim 1, wherein, The step of training an initial low-rank adaptive model using the multiple target address samples to obtain a target low-rank adaptive model includes: Each target address sample is subjected to word embedding processing to obtain the feature code of the target address sample; The initial low-rank adaptive model is added as a branch of the self-attention layer of the large language model to obtain the large language model to be adjusted; and The feature encoding of the target address sample is input into the large language model to be adjusted. Based on the forward propagation result of the feature encoding of the target address sample in the large language model to be adjusted, the model parameters of the initial low-rank adaptive model are adjusted by feedback to obtain the target low-rank adaptive model.

6. The method according to claim 5, wherein, The step of performing word embedding processing on each target address sample to obtain the feature encoding of the target address sample includes: The text included in the target address sample is segmented into multiple words. For each word, a vector encoding is performed on the word to obtain the first word code; Based on the position of the word in the text included in the target address sample, a location code is obtained; The positional encoding is embedded into the first word encoding to obtain the second word encoding; and The second word codes of each of the multiple words are combined to obtain the feature code of the target address sample.

7. A method for identifying fake addresses based on a large language model, comprising: In response to receiving a prompt text, if it is determined that the prompt words included in the prompt text match preset prompt words, the address text to be identified is extracted from the prompt text; Based on the preset thought chain and the address text to be identified, multiple rounds of question and answer are performed with the fine-tuned large language model to obtain the target answer text output by the fine-tuned large language model; as well as Based on the target answer text, determine the false address identification result of the address text to be identified; The fine-tuned large language model is obtained by fine-tuning the large language model using the fine-tuning method for large language models according to any one of claims 1 to 6.

8. A fine-tuning device for a large language model, comprising: An initial address generation module is used to generate multiple initial address samples based on the real addresses included in the order database, wherein the initial address samples include address text and initial tags for the address text; The first address generation module is used to perform data augmentation processing on the multiple initial address samples based on multiple abnormal modes to obtain multiple target address samples; An initial model training module is used to train an initial low-rank adaptive model using the multiple target address samples to obtain a target low-rank adaptive model. The initial low-rank adaptive model is obtained by initializing the self-attention layer of the large language model. The initial low-rank adaptive model includes a first initial low-rank matrix and a second initial low-rank matrix. The dimension of the matrix obtained by multiplying the first and second initial low-rank matrices is consistent with the dimension of the model parameters of the self-attention layer of the large language model. The model parameter adjustment module is used to adjust the model parameters of the self-attention layer of the large language model based on the model parameters of the target low-rank adaptive model to obtain the fine-tuned large language model. The module includes: performing vector multiplication on the first low-rank matrix and the second low-rank matrix included in the target low-rank adaptive model to obtain a parameter matrix; and adding the parameter matrix to the model parameters of the self-attention layer of the large language model to obtain the fine-tuned large language model.

9. A device for identifying fake addresses based on a large language model, comprising: The address text extraction module is used to extract the address text to be identified from the prompt text in response to receiving prompt text, provided that the prompt words included in the prompt text match the preset prompt words. The answer text output module is used to perform multiple rounds of question-and-answer with the fine-tuned large language model based on the preset thought chain and the address text to be identified, so as to obtain the target answer text output by the fine-tuned large language model; as well as A fake address identification module is used to determine the fake address identification result of the address text to be identified based on the target answer text; The fine-tuned large language model is obtained by fine-tuning the large language model using the fine-tuning method for large language models according to any one of claims 1 to 6.

10. An electronic device, comprising: One or more processors; Memory, used to store one or more programs. Wherein, when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any one of claims 1 to 7.

11. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.

12. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1 to 7.