Bias detection and processing method and device for large language model generated text
By performing multi-dimensional bias evaluation and correction on the text generated by the large language model, the problem of bias in the model training process is solved, an unbiased training dataset is generated, and the fairness and objectivity of the model are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA ELECTRONICS CYBERSPACE RESEARCH INSTITUTE CO LTD
- Filing Date
- 2024-12-31
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309736A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of natural language processing technology, and in particular to a method and apparatus for bias detection and processing of text generated from large language models. Background Technology
[0002] In the current context of the booming development of artificial intelligence, Large Language Models (LLMs) are widely used in various fields, such as online opinion analysis, due to their powerful text generation, analysis, and reasoning capabilities. However, the influence of inherent human biases on algorithms and models is becoming increasingly significant. Large Language Models are constantly learning and improving, and in the process, they encounter diverse viewpoints and theories from various data sources. Since the training data for Large Language Models mainly comes from the open-source world, a large amount of text data may be biased. If a Large Language Model is nurtured in such a biased data environment, it may be guided to form incorrect mindsets and generate biases. This can lead to the Large Language Model outputting biased viewpoints and subjective answers, or making unfair judgments about certain groups. Consequently, users may be influenced by these biases and make unfair and biased judgments and decisions when using the Large Language Model.
[0003] Despite the training institutions of large language models insisting on their impartiality, actual research shows that large language models often exhibit significant bias when answering sensitive questions or discussing sensitive topics. Therefore, it is necessary to detect and process bias in the text generated by large language models before using the processed text data to train the large language models, thus achieving debiasing. Bias detection and processing of the generated text is an indispensable part of the current development of debiasing for large language models. Summary of the Invention
[0004] In view of this, embodiments of the present invention provide a method and apparatus for bias detection and processing of text generated from large language models, in order to eliminate or improve one or more defects existing in the prior art.
[0005] One aspect of the present invention provides a bias detection and processing method for text generated by large language models, the method comprising the following steps:
[0006] Based on multiple generated texts, multiple keyword sequences corresponding to the multiple generated texts are obtained. Each keyword sequence includes multiple keywords. The multiple generated texts are formed by inputting multiple questions related to the controversial topic into a large language model in various questioning methods and then having the large language model output the answer corresponding to each question.
[0007] The degree of bias of the keyword sequence is scored based on a multi-dimensional bias evaluation index to obtain the bias score of the keyword sequence. The multi-dimensional bias evaluation index includes objectivity, sentiment tendency, stance analysis and citation source.
[0008] The keyword sequences with bias scores higher than a preset threshold are matched with bias categories in a preset bias category library to obtain the bias classification result of the keyword sequences with bias scores higher than the preset threshold. The bias category library is formed in advance by classifying multiple words in multiple text corpora with bias descriptions and obtaining multiple word bias category labels.
[0009] Based on the bias classification results, the bias correction is performed on the keyword sequences whose bias scores are higher than a preset threshold to obtain the corrected keyword sequences.
[0010] In some embodiments of the present invention, each generated text keyword sequence includes bias category keywords, sentiment bias keywords, stance bias keywords, and predefined keywords; the bias degree of the keyword sequence is scored based on a multi-dimensional bias evaluation index to obtain a bias score for the keyword sequence, including:
[0011] Set a context window and move the context window in the generated text. Each time it is moved, the co-occurrence frequency of bias category keywords and predefined keywords in the context window is counted to obtain the co-occurrence frequency of bias category keywords and predefined keywords in the generated text. The objectivity score of the generated text is obtained based on the co-occurrence frequency of bias category keywords and predefined keywords in the generated text.
[0012] Each time the text is moved, the frequency of sentiment-related keywords and stance-related keywords in the context window is counted to obtain the frequency of sentiment-related keywords and stance-related keywords in the generated text.
[0013] Based on the keyword sequence corresponding to the generated text, all sources of citation for the generated text are retrieved, and the ratio of official sources to personal sources is calculated.
[0014] The bias score of the keyword sequence corresponding to the generated text is obtained by weighting the objectivity score, the frequency of occurrence of sentiment-related keywords in the generated text, the frequency of occurrence of stance-related keywords in the generated text, and the ratio of official sources to personal sources.
[0015] In some embodiments of the present invention, keyword sequences with bias scores higher than a preset threshold are matched with a preset bias category library to obtain bias classification results for the keyword sequences with bias scores higher than the preset threshold, including:
[0016] The keyword sequence with a bias score higher than a preset threshold and the bias category labels of multiple words in the bias category library are respectively input into the pre-trained language model so that the pre-trained language model outputs the semantic vector sequence corresponding to the keyword sequence and the multiple semantic vectors corresponding to the bias category labels of multiple words.
[0017] Calculate the similarity between each semantic vector in the semantic vector sequence and each of the multiple semantic vectors corresponding to the bias category labels of multiple words. Take the bias category label corresponding to the semantic vector sequence with the highest similarity as the bias classification result of the keyword sequence corresponding to the semantic vector sequence.
[0018] In some embodiments of the present invention, the bias classification results include some or all of gender bias, racial bias, and regional bias.
[0019] In some embodiments of the present invention, bias correction is performed on keyword sequences with bias scores higher than a preset threshold based on the bias classification results, resulting in corrected keyword sequences, including:
[0020] To address gender bias, data augmentation techniques are used to swap the gender attribute words of gender keywords in keyword sequences with bias scores exceeding a preset threshold, and introduce a keyword sequence with unbiased descriptions related to gender. The keyword sequences before and after the swap and the introduced keyword sequence together form the corrected keyword sequence.
[0021] To address racial bias, a semantic substitution algorithm is used in conjunction with expert experience to semantically correct the bias descriptions in keyword sequences with bias scores exceeding a preset threshold. Data augmentation techniques are then used to swap the racial attribute words of racial keywords in keyword sequences with bias scores exceeding the preset threshold. The semantically corrected keyword sequences and the introduced keyword sequences together form the corrected keyword sequences.
[0022] To address regional bias, the biased descriptions in keyword sequences with bias scores exceeding a preset threshold are semantically corrected using expert experience and knowledge. Additionally, a keyword sequence with unbiased descriptions related to the region is introduced. The semantically corrected keyword sequence and the keyword sequences before and after the swap together constitute the corrected keyword sequence.
[0023] In some embodiments of the present invention, the method further includes:
[0024] The degree of bias of the modified keyword sequence is scored based on the multi-dimensional bias evaluation index to obtain the bias score of the modified keyword sequence;
[0025] If the bias score of the corrected keyword sequence is higher than a preset threshold, the steps of matching, bias correction and scoring are repeated for the corrected keyword sequence until the bias score of the corrected keyword sequence is lower than the preset threshold.
[0026] In some embodiments of the present invention, keyword sequences with bias scores below a preset threshold and corrected keyword sequences are converted into corresponding generated text, and the converted generated text forms a dataset for training a large language model.
[0027] Another aspect of the present invention provides a bias detection and processing apparatus for generating text based on a large language model. The apparatus includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor is used to execute the computer instructions stored in the memory. When the computer instructions are executed by the processor, the apparatus implements the steps of the aforementioned method.
[0028] Another aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the aforementioned method.
[0029] Another aspect of the present invention provides a computer program product including computer instructions that, when executed by a processor, implement the steps of the aforementioned method.
[0030] The present invention provides a bias detection and processing method and apparatus for text generated by large language models, which can accurately and effectively detect and process bias in the text generated by large language models, thereby generating training data that can be used to train large language models and can debias the trained large language models.
[0031] Additional advantages, objects, and features of the invention will be set forth in part in the description which follows, and will also become apparent in part to those skilled in the art upon studying the description, or may be learned by practice of the invention. The objects and other advantages of the invention can be realized and obtained by means of the structures specifically pointed out in the description and drawings.
[0032] Those skilled in the art will understand that the objectives and advantages achievable with the present invention are not limited to those specifically described above, and that the above and other objectives achievable with the present invention will become clearer from the following detailed description. Attached Figure Description
[0033] The accompanying drawings, which are provided to further illustrate the invention and form part of this application, are not intended to limit the scope of the invention.
[0034] Figure 1This is a flowchart illustrating a bias detection and processing method for text generation based on a large language model, according to one embodiment of the present invention. Detailed Implementation
[0035] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and descriptions of this invention are used to explain the invention, but are not intended to limit the invention.
[0036] It should also be noted that, in order to avoid obscuring the invention with unnecessary details, only the structures and / or processing steps closely related to the solution according to the invention are shown in the accompanying drawings, while other details that are not closely related to the invention are omitted.
[0037] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.
[0038] It should also be noted that, unless otherwise specified, the term "connection" in this article can refer not only to a direct connection, but also to an indirect connection involving an intermediary.
[0039] In the following description, embodiments of the invention will be illustrated with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar parts, or the same or similar steps.
[0040] The bias in the generated text of a large language model refers to the bias that the generated text, when used to train the large language model, will cause the trained large language model to reflect or amplify this bias, resulting in unfair and biased descriptions. Examples include text data with obvious social biases such as male dominance, female weakness, men being more suitable for engineering work, men being more suitable for programming, women lacking stamina in scientific research, northerners being rough, and southerners being shrewd.
[0041] To accurately and effectively detect and process bias in text generated by large language models, thereby generating training data that can be used to train large language models and debiased after training, this invention provides a method and apparatus for bias detection and processing of text generated by large language models. The method involves using a typical large language model as an initial model to generate questions, collecting and integrating the answers to the questions output by the large language model into a data pool, and scoring and evaluating the degree of bias in the text data in the data pool according to a predefined bias evaluation standard. Data entries with bias scores higher than a threshold are subject to bias processing until the bias score falls below the threshold before being used for training the large language model.
[0042] Figure 1This is a flowchart illustrating a bias detection and processing method for text generation based on a large language model, according to an embodiment of the present invention. Figure 1 As shown, the method includes the following steps:
[0043] Step S110: Based on multiple generated texts, obtain multiple keyword sequences corresponding to the multiple generated texts. Each keyword sequence includes multiple keywords. The multiple generated texts are formed by inputting multiple questions related to the controversial topic into a large language model in various questioning methods, and then having the large language model output the answer corresponding to each question.
[0044] Specifically, this method employs several popular large language models: ChatGPT3.5, Gemini, Llama, the JiuGeQianwen large model, and the Xunfei Xinghuo large model. Controversial topics refer to those currently experiencing high levels of debate, significant interest, considerable differences, and the potential for prolonged controversy, such as issues related to healthcare, education, and people's livelihoods. A total of 117 pre-defined questions were used across three questioning methods: direct questioning, filtering for misinformation, and malicious guidance. Each question was answered, resulting in a generated text for each answer output by the large language model, leading to multiple generated texts. The bias of the generated texts was detected and analyzed to examine the degree of bias in these large language models. The pre-defined questions were determined based on analysis of numerous real-world application scenarios and data requirements, aiming to comprehensively explore the performance of the large models in different contexts and provide a rich and diverse data foundation for subsequent analysis. Among these methods, direct questioning can be used to obtain the basic viewpoints of large models on common topics; questioning to filter false information helps to obtain the viewpoints output after verifying the authenticity of information; and questioning to guide malicious behavior can take into account the objectivity and bias of the text data output by large models when faced with malicious guidance. Asking questions to large models using multiple methods is beneficial for obtaining more comprehensive text data.
[0045] The system asks questions to a large language model, collects the model's output answers, generates text data, and integrates this data into a data pool. Specifically, it uses Python with a modular code architecture to build a system capable of batch reading data from an Excel file storing question data, processing and sending / receiving large volumes of requests and responses, and storing the answers returned by the large language model in a new Excel file. In subsequent steps, the text data, after bias detection and processing, is organized and batch-written into this new Excel file (including 1000GB of high-quality labeled data: text data and labels), forming a complete automated processing chain from data input to output, greatly improving the efficiency and accuracy of data processing.
[0046] By performing semantic understanding and feature extraction on the generated text output by the large model, or by extracting keywords from each generated text or the corresponding answer to each question, multiple keywords are obtained for each generated text. These multiple keywords are then arranged in sequence to form a keyword sequence for each generated text, resulting in multiple keyword sequences. The keyword sequences formed by the multiple keywords in each generated text include bias category keywords, sentiment category keywords, stance category keywords, and predefined keywords. Predefined keywords are keywords other than bias category keywords, sentiment category keywords, and stance category keywords. Bias category keywords include gender keywords, race keywords, and region keywords, as well as their respective attribute words. For example, attribute words for gender keywords include distinguishing words between male and female, pronouns for "she" and "he," kinship terms such as father, mother, brother, sister, etc., and gender terms such as man, woman, boy, girl, male, female, etc.; attribute words for race keywords include people of color, etc.; attribute words for region keywords include South, North, local, foreign, etc. Emotional keywords include terms like "stronger," "lacking stamina," "better," "more suitable," "worse," "excellent," "superb," "optimistic," "more suitable," "powerful," "strong," "dominant," "very poor," "terrible," "heartbreaking," and "unsuitable." Stance keywords include terms like "from," "from the perspective of," and "position." Predefined keywords include keywords related to personality traits, professions (engineer, fashion designer, programmer, waiter / waitress, etc.), technical fields (research ability, investment acumen, programming, finance, education, etc.), social roles, physical characteristics, cultural customs, economic status, and social image—excluding biased, emotional, and stance keywords.
[0047] Step S120: The degree of bias of the keyword sequence is scored based on a multi-dimensional bias evaluation index to obtain the bias score of the keyword sequence. The multi-dimensional bias evaluation index includes objectivity, sentiment tendency, position analysis and source of citation.
[0048] Specifically, in step S120, the degree of bias of the keyword sequence is scored based on a multi-dimensional bias evaluation index to obtain a bias score for the keyword sequence, including the following steps:
[0049] Set a context window and move the context window in the generated text. Each time it is moved, the co-occurrence frequency of bias category keywords and predefined keywords in the context window is counted to obtain the co-occurrence frequency of bias category keywords and predefined keywords in the generated text. The objectivity score of the generated text is obtained based on the co-occurrence frequency of bias category keywords and predefined keywords in the generated text.
[0050] Each time the text is moved, the frequency of sentiment-related keywords and stance-related keywords in the context window is counted to obtain the frequency of sentiment-related keywords and stance-related keywords in the generated text.
[0051] Based on the keyword sequence corresponding to the generated text, all sources of citation for the generated text are retrieved, and the ratio of official sources to personal sources is calculated.
[0052] The bias score of the keyword sequence corresponding to the generated text is obtained by weighting the objectivity score, the frequency of occurrence of sentiment-related keywords in the generated text, the frequency of occurrence of stance-related keywords in the generated text, and the ratio of official sources to personal sources.
[0053] In this step, the size of the context window can be set to include a context range of approximately 20 characters, or a range of approximately one sentence. The movement step size is the same as the window size. Generally, official sources state objective facts and are highly objective, thus text data citing such sources has a high degree of objectivity and a relatively low degree of bias. Conversely, text data from personal sources contains a large amount of personal opinions or subjective inferences based on objective facts, making it more subjective. Therefore, text data citing such sources has a lower degree of objectivity and a relatively high degree of bias. Official sources can be authoritative news media, while personal sources can be personal posts. The higher the bias score of a keyword sequence, the higher the degree of bias in the text generated by that keyword sequence.
[0054] Step S130: Match the keyword sequences with bias scores higher than a preset threshold with the bias categories in the preset bias category library to obtain the bias classification result of the keyword sequences with bias scores higher than the preset threshold. The bias category library is formed in advance by classifying multiple words in multiple text corpora with bias descriptions and obtaining multiple word bias category labels.
[0055] Specifically, the bias classification results include some or all of gender bias, racial bias, and regional bias. That is to say, a text description in a generated text may involve only gender bias, racial bias, or regional bias, or it may involve any two of these biases, or it may involve all of them.
[0056] In the specific implementation process, a large amount of text from multiple text corpora containing biased descriptions is extracted or segmented to obtain multiple words. After classifying these multiple words using expert experience and knowledge, bias category labels are obtained for each word. A bias category library is pre-constructed based on these bias category labels. For example, words such as "male dominance," "female weakness," "men are more suitable for programming," or "male," "dominance," "female," "weakness," and "programming" extracted from the text are labeled as gender bias. These words serve as specific examples under the gender bias label and form a sub-lexicon. Words such as "marginalization of people of color" or "people of color" and "marginalization" are labeled as racial bias. These words serve as specific examples under the racial bias label and form a sub-lexicon. Words such as "northerners are rugged," "southerners are shrewd," "outsiders are xenophobic," or "northerners," "rugged," "southerners," and "xenophobic" are labeled as regional bias. These words serve as specific examples under the regional bias label and form a sub-lexicon. This bias category database meticulously records the specific manifestations and related statistical data of biases based on gender, region, and other factors. As new bias cases, manifestations, and specific textual descriptions emerge, the characteristics and judgment criteria for different bias types can be continuously updated and improved. For example, when new forms of gender-biased rhetoric appear online, they can be promptly included in the database, thereby improving the accuracy and adaptability of the classification to address the increasingly complex and evolving phenomenon of social bias.
[0057] In step S130, the keyword sequences with bias scores higher than a preset threshold are matched with a preset bias category library to obtain the bias classification result of the keyword sequences with bias scores higher than the preset threshold, including the following steps:
[0058] The keyword sequence with a bias score higher than a preset threshold and the bias category labels of multiple words in the bias category library are respectively input into the pre-trained language model so that the pre-trained language model outputs the semantic vector sequence corresponding to the keyword sequence and the multiple semantic vectors corresponding to the bias category labels of multiple words.
[0059] The similarity between each semantic vector in the semantic vector sequence and each of the multiple semantic vectors corresponding to the bias category labels of multiple words is calculated. The bias category label corresponding to the highest similarity among the semantic vectors in the semantic vector sequence is taken as the bias classification result of the keyword sequence corresponding to the semantic vector sequence. The similarity calculation can be based on cosine similarity or other methods.
[0060] Step S140: Based on the bias classification results, the keyword sequences with bias scores higher than a preset threshold are biased and corrected to obtain the corrected keyword sequences.
[0061] Specifically, in step S140, the keyword sequences with bias scores higher than a preset threshold are biased according to the bias classification results to obtain the corrected keyword sequences, including the following steps:
[0062] To address gender bias, data augmentation techniques are used to swap the gender attribute words of gender keywords in keyword sequences with bias scores exceeding a preset threshold, and introduce a keyword sequence with unbiased descriptions related to gender. The keyword sequences before and after the swap and the introduced keyword sequence together form the corrected keyword sequence.
[0063] To address racial bias, a semantic substitution algorithm is used in conjunction with expert experience to semantically correct the bias descriptions in keyword sequences with bias scores exceeding a preset threshold. Data augmentation techniques are then used to swap the racial attribute words of racial keywords in keyword sequences with bias scores exceeding the preset threshold. The semantically corrected keyword sequences and the introduced keyword sequences together form the corrected keyword sequences.
[0064] To address regional bias, the biased descriptions in keyword sequences with bias scores exceeding a preset threshold are semantically corrected using expert experience and knowledge. Additionally, a keyword sequence with unbiased descriptions related to the region is introduced. The semantically corrected keyword sequence and the keyword sequences before and after the swap together constitute the corrected keyword sequence.
[0065] In this step, to address gender bias, gender attribute word swapping can be, for example, replacing "male" with "female" in the keyword sequence {male, more suitable, engineer}, and replacing "female" with "male" in the keyword sequence {female, more suitable, waiter}, resulting in the swapped keyword sequences {female, more suitable, engineer} and {male, more suitable, waiter}. The introduced gender-related unbiased descriptions can be related to "professions": {"doctor" ["Both male and female doctors provide high-quality medical services to patients with their professional knowledge and rich experience; whether male or female, anyone can become an excellent doctor through professional study and practice"]; "engineer" ["In the engineering field, many female engineers demonstrate outstanding innovative abilities, working alongside male engineers to drive industry development; both men and women have the opportunity to utilize their talents in the engineering industry, and gender does not limit the possibility of becoming an excellent engineer"]}. To address racial bias, attribute word swapping works similarly to gender bias; semantic modification includes mapping "biased terms used against a certain race" to "people of that race" and replacing "biased terms used against a certain race" with "people of that race," etc. To address regional bias, expert experience and knowledge of local cultural backgrounds are used to accurately semantically correct biased descriptions and introduce unbiased descriptions relevant to different regions. Through these bias correction steps, semantic bias in keywords can be eliminated. For example, keywords related to gender (e.g., male, female) can be made semantically equidistant from occupational keywords (e.g., engineer, fashion designer, waiter / waitress).
[0066] In some embodiments, the bias detection and processing method for text generated by large language models further includes the following steps:
[0067] The degree of bias of the modified keyword sequence is scored based on the multi-dimensional bias evaluation index to obtain the bias score of the modified keyword sequence;
[0068] If the bias score of the corrected keyword sequence is higher than a preset threshold, the steps of matching, bias correction and scoring are repeated for the corrected keyword sequence until the bias score of the corrected keyword sequence is lower than the preset threshold.
[0069] By re-scoring the bias-corrected keyword sequences using the same bias evaluation metrics through the above steps, the consistency and accuracy of bias assessment can be ensured. Establishing a feedback mechanism for the bias correction results, and continuously optimizing and adjusting the bias descriptions in the generated text of the large language model, ensures that the generated text of the large language model after bias detection and correction is accurately unbiased.
[0070] In some embodiments, keyword sequences with bias scores below a preset threshold and corrected keyword sequences are converted into corresponding generated text, and the converted generated text forms a dataset used to train a large language model to eliminate the bias of the large language model.
[0071] Corresponding to the above method, the present invention also provides a bias detection and processing device for generating text based on a large language model. The device includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor is used to execute the computer instructions stored in the memory. When the computer instructions are executed by the processor, the device implements the steps of the aforementioned method.
[0072] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the aforementioned method. The computer-readable storage medium may be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register, floppy disk, hard disk, removable storage disk, CD-ROM, or any other form of storage medium known in the art.
[0073] This invention also provides a computer program product, including computer instructions that, when executed by a processor, implement the steps of the aforementioned method.
[0074] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this invention are programs or code segments used to perform the desired tasks. The programs or code segments can be stored in a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried in a carrier wave.
[0075] It should be clarified that the present invention is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of the present invention.
[0076] In this invention, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.
[0077] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations of the embodiments of the present invention are possible. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A bias detection and processing method for text generated from large language models, characterized in that, The method includes: Based on multiple generated texts, multiple keyword sequences corresponding to the multiple generated texts are obtained. Each keyword sequence includes multiple keywords. The multiple generated texts are formed by inputting multiple questions related to the controversial topic into a large language model in various questioning methods and then having the large language model output the answer corresponding to each question. The degree of bias of the keyword sequence is scored based on a multi-dimensional bias evaluation index to obtain the bias score of the keyword sequence. The multi-dimensional bias evaluation index includes objectivity, sentiment tendency, stance analysis and citation source. The keyword sequences with bias scores higher than a preset threshold are matched with bias categories in a preset bias category library to obtain the bias classification result of the keyword sequences with bias scores higher than the preset threshold. The bias category library is formed by classifying multiple words in multiple text corpora with bias descriptions and obtaining bias category labels for multiple words. Based on the bias classification results, the bias correction is performed on the keyword sequences whose bias scores are higher than a preset threshold to obtain the corrected keyword sequences.
2. The method according to claim 1, characterized in that, Each generated text's keyword sequence includes bias category keywords, sentiment category keywords, stance category keywords, and predefined keywords; The bias level of the keyword sequence is scored based on a multi-dimensional bias evaluation index to obtain a bias score for the keyword sequence, including: Set a context window and move the context window in the generated text. Each time it is moved, the co-occurrence frequency of bias category keywords and predefined keywords in the context window is counted to obtain the co-occurrence frequency of bias category keywords and predefined keywords in the generated text. The objectivity score of the generated text is obtained based on the co-occurrence frequency of bias category keywords and predefined keywords in the generated text. Each time the text is moved, the frequency of sentiment-related keywords and stance-related keywords in the context window is counted to obtain the frequency of sentiment-related keywords and stance-related keywords in the generated text. Based on the keyword sequence corresponding to the generated text, all sources of citation for the generated text are retrieved, and the ratio of official sources to personal sources is calculated. The bias score of the keyword sequence corresponding to the generated text is obtained by weighting the objectivity score, the frequency of occurrence of sentiment-related keywords in the generated text, the frequency of occurrence of stance-related keywords in the generated text, and the ratio of official sources to personal sources.
3. The method according to claim 1, characterized in that, The keyword sequences with bias scores higher than a preset threshold are matched with a preset bias category library to obtain the bias classification results of the keyword sequences with bias scores higher than the preset threshold, including: The keyword sequence with a bias score higher than a preset threshold and the bias category labels of multiple words in the bias category library are respectively input into the pre-trained language model so that the pre-trained language model outputs the semantic vector sequence corresponding to the keyword sequence and the multiple semantic vectors corresponding to the bias category labels of multiple words. Calculate the similarity between each semantic vector in the semantic vector sequence and each of the multiple semantic vectors corresponding to the bias category labels of multiple words. Take the bias category label corresponding to the semantic vector sequence with the highest similarity as the bias classification result of the keyword sequence corresponding to the semantic vector sequence.
4. The method according to claim 1 or 3, characterized in that, The bias classification results include some or all of gender bias, racial bias, and regional bias.
5. The method according to claim 4, characterized in that, Based on the bias classification results, the keyword sequences with bias scores higher than a preset threshold are bias-corrected to obtain the corrected keyword sequences, including: To address gender bias, data augmentation techniques are used to swap the gender attribute words of gender keywords in keyword sequences with bias scores exceeding a preset threshold, and introduce a keyword sequence with unbiased descriptions related to gender. The keyword sequences before and after the swap and the introduced keyword sequence together form the corrected keyword sequence. To address racial bias, a semantic substitution algorithm is used in conjunction with expert experience to semantically correct the bias descriptions in keyword sequences with bias scores exceeding a preset threshold. Data augmentation techniques are then used to swap the racial attribute words of racial keywords in keyword sequences with bias scores exceeding the preset threshold. The semantically corrected keyword sequences and the introduced keyword sequences together form the corrected keyword sequences. To address regional bias, the biased descriptions in keyword sequences with bias scores exceeding a preset threshold are semantically corrected using expert experience and knowledge. Additionally, a keyword sequence with unbiased descriptions related to the region is introduced. The semantically corrected keyword sequence and the keyword sequences before and after the swap together constitute the corrected keyword sequence.
6. The method according to claim 1, characterized in that, The method further includes: The degree of bias of the modified keyword sequence is scored based on the multi-dimensional bias evaluation index to obtain the bias score of the modified keyword sequence; If the bias score of the corrected keyword sequence is higher than a preset threshold, the steps of matching, bias correction and scoring are repeated for the corrected keyword sequence until the bias score of the corrected keyword sequence is lower than the preset threshold.
7. The method according to claim 6, characterized in that, Keyword sequences with bias scores below a preset threshold and corrected keyword sequences are converted into corresponding generated texts. The converted generated texts form a dataset for training a large language model.
8. A bias detection and processing device for text generation based on large language models, comprising a processor, a memory, and computer instructions stored in the memory, characterized in that, The processor is configured to execute the computer instructions, and when the computer instructions are executed, the device implements the steps of the method as described in any one of claims 1 to 7.
9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the steps of the method as described in any one of claims 1 to 7.
10. A computer program product comprising computer instructions, characterized in that, When executed by a processor, the computer instructions implement the steps of the method according to any one of claims 1 to 7.