Question and answer method based on unsupervised learning and fusing customer portrait and related product

By integrating customer profiles through unsupervised learning, user clusters are automatically segmented and historical Q&A information is integrated, solving the problems of high cost and low efficiency in existing Q&A methods and realizing an efficient and low-cost Q&A process.

CN115712707BActive Publication Date: 2026-06-12ZHAOLIAN CONSUMER FINANCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHAOLIAN CONSUMER FINANCE CO LTD
Filing Date
2022-11-14
Publication Date
2026-06-12

Smart Images

  • Figure CN115712707B_ABST
    Figure CN115712707B_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a question and answer method based on unsupervised learning and fusion of customer portraits and related products, the method comprising: a terminal device obtaining attribute information of a user to be consulted, performing attribute processing on the attribute information, and performing clustering analysis to determine a first cluster of the user; the terminal device integrates historical question and answer information of all users belonging to the same cluster as a question and answer library of the first cluster; the terminal device recalls similar n answers in the question and answer library of the first cluster, performs score evaluation on the n answers to obtain n scores, and determines a first answer corresponding to the highest score as the answer of the user to be consulted. The present application has the advantage of low cost.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of big data and financial technology, and in particular to a question-answering method and related products based on unsupervised learning and customer profile fusion. Background Technology

[0002] Existing question-answering methods, such as indexed question answering, typically construct one or more frequently asked questions and answers databases based on task requirements, perform unified recall and sorting, and finally return the best answer. However, existing question-answering methods have a drawback: building a question-answering database based on a specific task requires a significant amount of time to design the relevant question-answering language and label the user's actual category. This approach is very time-consuming and costly, thus making the cost of answering questions high. Summary of the Invention

[0003] This application discloses a question-answering method and related products based on unsupervised learning and customer profile fusion. This method can automatically classify users and reuse historical related dialogues of similar users to find answers, thereby reducing the processing time of the question-answering database, reducing costs, and improving user experience.

[0004] Firstly, a question-answering method based on unsupervised learning and customer profile fusion is provided, the method comprising the following steps:

[0005] The terminal device obtains the attribute information of the user to be consulted, performs attribute processing on the attribute information, and then performs cluster analysis to determine the first cluster of the user.

[0006] The terminal device integrates the historical question and answer information of all users belonging to the same cluster into the question and answer database of the first cluster;

[0007] The terminal device recalls n similar answers from the first cluster of question-and-answer databases, sorts and evaluates the scores of the n answers to obtain n scores, and determines the first answer corresponding to the highest score among the n scores as the answer for the user to be consulted.

[0008] Secondly, a question-answering system based on unsupervised learning and customer profile fusion is provided, the system comprising:

[0009] The acquisition unit is used to acquire the attribute information of the user to be consulted, perform attribute processing on the attribute information, and then perform cluster analysis to determine the first cluster of the user.

[0010] The processing unit is used to integrate the historical question and answer information of all users belonging to the same cluster into a question and answer database for the first cluster, recall n similar answers in the question and answer database of the first cluster, sort and evaluate the scores of the n answers to obtain n scores, and determine the first answer corresponding to the highest score among the n scores as the answer for the user to be consulted.

[0011] Thirdly, an electronic device is provided, including a processor, a memory, a communication interface, and one or more programs, said one or more programs being stored in the memory and configured to be executed by the processor, said programs including instructions for performing the steps of the method described in the first aspect.

[0012] Fourthly, a computer-readable storage medium is provided for storing a computer program for electronic data interchange, wherein the computer program causes a computer to perform the method described in the first aspect.

[0013] Fifthly, a computer program product is provided, comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described in the first aspect of the embodiments of this application. The computer program product may be a software installation package.

[0014] The technical solution provided in this application allows the terminal device to acquire the attribute information of the user to be consulted, perform attribute processing on this attribute information, and then perform cluster analysis to determine the first cluster of the user. The terminal device integrates the historical question-and-answer information of all users belonging to the same cluster as the question-and-answer database of the first cluster. The terminal device recalls n similar answers from the question-and-answer database of the first cluster, ranks and evaluates the scores of the n answers to obtain n scores, and determines the answer corresponding to the highest score among the n scores as the answer for the user to be consulted. In this way, by optimizing the question-and-answer process based on the user profile, and since this solution does not require manual processing, it reduces labor costs and improves the user experience. Attached Figure Description

[0015] The accompanying drawings used in the embodiments of this application are described below.

[0016] Figure 1 This is a schematic diagram of the structure of a terminal device according to this application;

[0017] Figure 2 This is a flowchart illustrating the index-based question-answering method for unsupervised clustering proposed in this application;

[0018] Figure 3 This is a flowchart illustrating the question-answering method based on unsupervised learning and customer profile fusion provided in this application;

[0019] Figure 4 This is a schematic diagram of the structure of a question-answering system based on unsupervised learning and fusion of customer profiles, provided in one embodiment of this application;

[0020] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation

[0021] The embodiments of this application are described below with reference to the accompanying drawings.

[0022] In this application, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this document indicates that the preceding and following related objects have an "or" relationship.

[0023] In this application's embodiments, "multiple" refers to two or more. The use of terms like "first," "second," etc., in this application's embodiments is merely illustrative and for distinguishing the described objects; it has no order and does not indicate a specific limitation on the number of devices in this application's embodiments, nor does it constitute any limitation on the embodiments of this application. The term "connection" in this application's embodiments refers to various connection methods, such as direct or indirect connections, to achieve communication between devices; this application's embodiments do not impose any limitations on this.

[0024] BERT transforms text into a numerical matrix through data transformation or mapping. During the modeling process, this numerical matrix is ​​used as the text representation and participates in training and computation.

[0025] BM25 is an algorithm used to evaluate the relevance between search terms and documents. It is an algorithm based on a probabilistic retrieval model. To describe the BM25 algorithm in simple terms: We have a query and a batch of documents Ds, and now we want to calculate the relevance score between the query and each document D.

[0026] NER, short for Named Entity Recognition, aims to identify entities of interest in text, such as location, organization, and time. The identified entities can be used in various downstream applications, such as systems for identifying and extracting information from patient records, or as features in machine learning systems for other natural language processing tasks.

[0027] K-means clustering is an iterative clustering algorithm. Its steps are as follows: divide the data into K groups, randomly select K objects as initial cluster centers, calculate the distance between each object and each seed cluster center, and assign each object to the nearest cluster center. The cluster centers and the objects assigned to them represent a cluster. Each time a sample is assigned, the cluster centers are recalculated based on the existing objects in the cluster. This process is repeated until a certain termination condition is met.

[0028] See Figure 1 , Figure 1 A schematic diagram of the structure of a terminal device is provided, such as... Figure 1 As shown, the terminal device may include a processor, memory, communication unit, and bus. Depending on the function, the processor may be equipped with hardware structures such as a microphone and mobile devices. In practical applications, it can also be implemented based on different hardware. In practical applications, the terminal device may also be integrated into other hardware devices, such as smartphones, servers, and computer devices.

[0029] This application provides an unsupervised clustering-based index-based question answering method, which is as follows: Figure 2 As shown, it includes the following steps:

[0030] Extracting user attribute information: This mainly involves extracting information such as user income, age, marital status, gender, occupation, etc.

[0031] User attribute processing: One-hot encoding is performed on some discrete attribute information of users, such as "marital status, gender, age, etc.", and sklearn.preprocessing is used for standardization.

[0032] User clustering: Standardized user features are input into k-means for multiple clustering. The elbow method is used to select the optimal k value with SSE (sum of squared errors) as the evaluation index.

[0033] First, set a small k value and then gradually increase the k value to plot the relationship between SSE and cluster size. When k reaches the actual number of clusters, the increase in the k value will rapidly reduce the degree of aggregation, so the decrease in SSE will decrease sharply. Then, as the k value continues to increase, it tends to level off. In other words, the relationship between SSE and k value is shaped like an elbow, and the k value corresponding to this elbow is the actual number of clusters in the data, which is also the optimal k value we are looking for.

[0034] Recall Library Construction: Based on the clustering results, a question-and-answer library is built. Questions and answers from all users belonging to the same cluster are integrated into one category of the question-and-answer library. This process is repeated to build a question-and-answer library with the same number of clusters as the number of clusters.

[0035] New user information integration: For new users, discretized attributes are one-hot encoded, while sklearn.preprocessing performs standardization operations on other attributes such as "income". The processed attributes are then fed into a clustering model for prediction to determine which category the user belongs to. Simultaneously, a recall database corresponding to each category is found, and a NER model is used to identify existing user attribute entities in the recall database, replacing these entities with the new user's attribute information.

[0036] Recall: Using BM25 for a new user query, a recall is performed in the recall library of its category, returning the 30 most similar results.

[0037] Sorting: Based on the 30 similar queries recalled in the respective recall database, match them with the actual query using the sorting algorithm sbert, and return the answer to the question with the highest score, which is the optimal answer.

[0038] See Figure 3 , Figure 3 A flowchart illustrating a question-answering method based on unsupervised learning and customer profiling, as provided in this application, is shown below. Figure 3 As shown, the method includes the following steps:

[0039] Step S301: The terminal device obtains the attribute information of the user to be consulted, performs attribute processing on the attribute information, and then performs cluster analysis to determine the first cluster of the user.

[0040] The user's attribute information includes, but is not limited to: marital status, gender, age, income, occupation, etc.

[0041] For example, a specific implementation scheme for step S301 above may include:

[0042] The attribute information is standardized using sklearn.preprocessing to obtain standardized user features. The standardized user features are then input into k-means for multiple clustering. The elbow method is used to select the optimal k value with SSE (sum of squared errors) as the evaluation index. The cluster corresponding to the optimal k value is determined as the first cluster.

[0043] Step S302: The terminal device integrates the historical question and answer information of all users belonging to the same cluster as the question and answer database of the first cluster;

[0044] Step S303: The terminal device recalls n similar answers from the question-and-answer database of the first cluster, sorts and evaluates the scores of the n answers to obtain n scores, and determines the first answer corresponding to the highest score among the n scores as the answer for the user to be consulted.

[0045] The above n is a relatively large value, such as n>15, n>20, n>30, etc.

[0046] The technical solution provided in this application allows the terminal device to acquire the attribute information of the user to be consulted, perform attribute processing on this attribute information, and then perform cluster analysis to determine the first cluster of the user. The terminal device integrates the historical question-and-answer information of all users belonging to the same cluster as the question-and-answer database of the first cluster. The terminal device recalls n similar answers from the question-and-answer database of the first cluster, ranks and evaluates the scores of the n answers to obtain n scores, and determines the answer corresponding to the highest score among the n scores as the answer for the user to be consulted. In this way, by optimizing the question-and-answer process based on the user profile, and since this solution does not require manual processing, it reduces labor costs and improves the user experience.

[0047] For example, the aforementioned terminal device may integrate the historical question-and-answer information of all users belonging to the same cluster into a question-and-answer database for the first cluster, specifically including:

[0048] The terminal device queries the corresponding clusters from all users' historical questions and answers, and determines the historical answers of the same cluster as the first cluster as the pre-selected question and answer library. It extracts the user profile corresponding to each question and answer in the pre-selected question and answer library, merges similar user profiles to obtain m merged user profiles, compares the m user profiles with the first user profile of the user to be consulted to determine m similarities, selects m1 similarities with greater than the similarity threshold from the m similarities, and retains the historical answers of the users corresponding to the m1 user profiles in the pre-selected question and answer library to obtain the question and answer library of the first cluster.

[0049] For example, the above process of extracting the user profile corresponding to each question and answer in the pre-selected question and answer database, merging similar user profiles to obtain m merged user profiles, and comparing the m user profiles with the first user profile of the user to be consulted to determine the m similarities can specifically include:

[0050] Extract the user profile corresponding to each question and answer in the pre-selected question and answer database, construct multiple vector values ​​corresponding to each user profile, calculate the difference between each pair of multiple vector values, determine the vectors with differences less than a preset threshold as a group of vectors, determine all vectors in a group of vectors as similar user profiles, calculate the average value of a group of vectors to obtain the average vector, determine the average vector as the merged user profile, traverse all groups of vectors to obtain m user profiles, extract the first vector of the first user profile, calculate the difference between the first vector and the m average vectors of the m user profiles to obtain m difference vectors, calculate the average value of the element values ​​of each of the m difference vectors to obtain the m similarities.

[0051] For example, the above calculation of the differences between each pair of multiple vector values, and the determination of vectors whose differences are less than a preset threshold as a group of vectors, can specifically include:

[0052] From multiple vectors, select a first vector. Perform a subtraction operation between the first vector and the remaining vectors of the multiple vectors to obtain α difference vectors. Calculate the average of the elements of each of the α difference vectors to obtain α means. Determine the vectors corresponding to the means less than a preset threshold from the α means as a group of vectors that are in the same group as the first vector. Then, select β means from the α means that are greater than or equal to the preset threshold. Arrange the β means in descending order to obtain a first sequence. Calculate the difference between any two adjacent values ​​in the first sequence to obtain β-1 difference values. Extract the values ​​from the β-1 difference values. If the first difference is less than a preset threshold, the first and second values ​​in the first sequence are grouped together. If the second difference is less than a preset threshold, the third value in the first sequence is grouped into the same group as the first and second values. If the second difference is greater than or equal to a preset threshold, it is determined whether the third difference is less than a preset threshold. If the third difference is less than a preset threshold, the third and fourth values ​​are grouped together. If the third difference is greater than or equal to a preset threshold, the third value is grouped separately. The β mean values ​​are traversed to determine the value corresponding to each group, and the vector corresponding to each group value is determined as a vector group.

[0053] This method reduces computation compared to the usual pairwise difference calculation. For common difference operations, such as with 5 vectors, each with 100 elements, the usual calculation would require 4 + 3 + 2 + 1 = 10 vector differences, and 100 elements would require 10... 3 The difference operation involves calculating the vector difference four times, followed by four mean operations, and then another difference operation. This requires performing 4*100 + 4*100 + subsequent mean difference operations. Since the subsequent mean difference operations are negligible, only about 800 difference operations are needed. This saves about 20% of the computation and improves the speed of computation.

[0054] For example, the above methods may also include:

[0055] The terminal device processes the attribute information of the user seeking consultation according to the rules to obtain the first user information, and then integrates the first user information into the question and answer information database.

[0056] For example, the aforementioned terminal device recalling n similar answers within the first cluster of question-answering databases may specifically include:

[0057] The terminal device extracts the text information of the user to be consulted, performs word segmentation and central sentence processing on the text information to obtain x1 keywords and corresponding x1 central sentences. Based on the x1 central sentences, it queries the question-and-answer database of the first cluster for x2 expression patterns with the same meaning as each central sentence. It randomly selects an expression pattern from the x2 expression patterns corresponding to each central sentence to obtain x1 expression sentences. The x1 expression sentences are arranged in the order of x1 keywords to obtain a voice data. Similarly, multiple voice data are obtained. The multiple voice data are matched for fluency to obtain the first voice data with the highest fluency. The first voice data is determined as the playback voice data of the text information.

[0058] See Figure 4 , Figure 4 This application provides a schematic diagram of the structure of a question-answering system based on unsupervised learning and customer profiling, the system comprising:

[0059] The acquisition unit 401 is used to acquire the attribute information of the user to be consulted, perform attribute processing on the attribute information, and then perform cluster analysis to determine the first cluster of the user.

[0060] The processing unit 402 is used to integrate the historical question and answer information of all users belonging to the same cluster into a question and answer library for the first cluster, recall n similar answers in the question and answer library of the first cluster, sort and evaluate the scores of the n answers to obtain n scores, determine the first answer corresponding to the highest score among the n scores, and determine the answer of the user to be consulted.

[0061] Example,

[0062] Processing unit 402 is specifically used to query the corresponding cluster from the historical questions and answers of all users, determine the historical answers of the same cluster as the first cluster as the pre-selected question and answer library, extract the user profile corresponding to each question and answer in the pre-selected question and answer library, merge similar user profiles to obtain m merged user profiles, compare the m user profiles with the first user profile of the user to be consulted to determine m similarities, select m1 similarities with greater than the similarity threshold from the m similarities, and retain the historical answers of the users corresponding to the m1 user profiles in the pre-selected question and answer library to obtain the question and answer library of the first cluster.

[0063] Example,

[0064] Processing unit 402 is specifically used to extract user profiles corresponding to each question and answer in the pre-selected question and answer database, construct multiple vector values ​​corresponding to each user profile, calculate the differences between each pair of multiple vector values, determine the vectors with differences less than a preset threshold as a group of vectors, determine all vectors in a group of vectors as similar user profiles, calculate the average value of a group of vectors to obtain an average vector, determine the average vector as the merged user profile, traverse all groups of vectors to obtain m user profiles, extract the first vector of the first user profile, calculate the difference between the first vector and the m average vectors of the m user profiles to obtain m difference vectors, and calculate the average value of the element values ​​of each difference vector in the m difference vectors to obtain m similarities.

[0065] Example,

[0066] The processing unit 402 is also used to process the attribute information of the user to be consulted according to the rules to obtain the first user information, and to integrate the first user information into the question and answer information database.

[0067] Example,

[0068] The processing unit 402 is specifically used to extract the text information of the user to be consulted, perform word segmentation and central sentence processing on the text information to obtain x1 keywords and corresponding x1 central sentences, query x2 expression patterns with the same meaning for each central sentence from the question-and-answer database of the first cluster based on the x1 central sentences, arbitrarily select an expression pattern from the x2 expression patterns corresponding to each central sentence to obtain x1 expression patterns, obtain a voice data by sorting the x1 expression patterns according to the order of x1 keywords, and similarly obtain multiple voice data, perform fluency matching on the multiple voice data to obtain the first voice data with the highest fluency, and determine the first voice data as the playback voice data of the text information.

[0069] It is understood that, in order to achieve the aforementioned functions, the above-described apparatus includes hardware and / or software modules corresponding to the execution of each function. Based on the algorithmic steps of the various examples described in conjunction with the embodiments disclosed herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed in hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application in conjunction with the embodiments, but such implementation should not be considered beyond the scope of this application.

[0070] This embodiment can divide the electronic device into functional modules according to the above method example. For example, each function can be divided into its own functional modules, or two or more functions can be integrated into one processing module. The integrated modules can be implemented in hardware. It should be noted that the module division in this embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0071] It should be noted that all relevant content of each step involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and will not be repeated here.

[0072] When using integrated units, the user equipment may include a processing module and a storage module. The processing module can be used to control and manage the actions of the user equipment; for example, it can support the electronic device in executing the steps performed by the acquisition unit, communication unit, and processing unit described above. The storage module can support the electronic device in executing stored program code and data.

[0073] The processing module can be a processor or a controller. It can implement or execute various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. The processor can also be a combination of functions that implement computing capabilities, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, etc. The storage module can be a memory. The communication module can specifically be a radio frequency circuit, a Bluetooth chip, a Wi-Fi chip, or other devices that interact with other electronic devices.

[0074] It is understood that the interface connection relationships between the modules illustrated in the embodiments of this application are merely illustrative and do not constitute a structural limitation on the user equipment. In other embodiments of this application, the user equipment may also employ different interface connection methods or combinations of multiple interface connection methods as described in the above embodiments.

[0075] Please see Figure 5 , Figure 5 This application provides an electronic device 50, which includes a processor 501, a memory 502, a communication interface 503, and a display screen 504. The processor 501, memory 502, and communication interface 503 are interconnected via a bus. The display screen powers the electronic device. The electronic device may further include:

[0076] The memory 502 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or compact disc read-only memory (CD-ROM), and is used for related computer programs and data. The communication interface 503 is used for receiving and sending data.

[0077] Processor 501 can be one or more central processing units (CPUs). If processor 501 is a CPU, the CPU can be a single-core CPU or a multi-core CPU.

[0078] Processor 501 may include one or more processing units, such as application processors (APs), modem processors, graphics processing units (GPUs), image signal processors (ISPs), controllers, video codecs, digital signal processors (DSPs), baseband processors, and / or neural network processing units (NPUs). Different processing units may be independent components or integrated into one or more processors. In some embodiments, the user equipment may also include one or more processing units. The controller can generate operation control signals based on instruction opcodes and timing signals to control instruction fetching and execution. In other embodiments, the processing unit may also include a memory for storing instructions and data. For example, the memory in the processing unit may be a cache memory. This memory can store instructions or data that the processing unit has just used or is repeatedly used. If the processing unit needs to reuse the instruction or data, it can directly retrieve it from the memory. This avoids repeated access, reduces the waiting time of the processing unit, and thus improves the efficiency of the user equipment in processing data or executing instructions.

[0079] In some embodiments, the processor 501 may include one or more interfaces. These interfaces may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver / transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input / output (GPIO) interface, a SIM card interface, and / or a USB interface, etc. The USB interface is a USB standard-compliant interface, specifically a Mini USB interface, a Micro USB interface, a USB Type-C interface, etc. The USB interface can be used to connect a charger to charge the user device, and can also be used for data transfer between the user device and peripheral devices. The USB interface can also be used to connect headphones for audio playback.

[0080] If the electronic device 50 is a user device or terminal device, such as a smartphone, computer device, or server, the processor 501 in the electronic device 50 is used to read the computer program code stored in the memory 502 and perform the following operations:

[0081] Obtain the attribute information of the user to be consulted, perform attribute processing on the attribute information, and then perform cluster analysis to determine the first cluster of the user; integrate the historical question and answer information of all users belonging to the same cluster as the question and answer database of the first cluster; recall n similar answers in the question and answer database of the first cluster, sort and evaluate the scores of the n answers to obtain n scores, and determine the first answer corresponding to the highest score among the n scores, which is determined as the answer for the user to be consulted.

[0082] For example, the terminal device integrates the historical question-and-answer information of all users belonging to the same cluster into a question-and-answer database for the first cluster, specifically including:

[0083] The system queries the corresponding clusters from all users' historical Q&A questions. The historical answers of the clusters that are the same as the first cluster are identified as the pre-selected Q&A database. The user profiles corresponding to each question and answer in the pre-selected Q&A database are extracted. Similar user profiles are merged to obtain m merged user profiles. The m user profiles are compared with the first user profile of the user to be consulted to determine m similarities. From the m similarities, m1 similarities with similarity greater than the similarity threshold are selected as the m1 user profiles. The historical answers of the users corresponding to the m1 user profiles are retained in the pre-selected Q&A database to obtain the Q&A database of the first cluster.

[0084] For example, the step of extracting the user profile corresponding to each question and answer in the pre-selected question and answer database, merging similar user profiles to obtain m merged user profiles, and comparing the m user profiles with the first user profile of the user to be consulted to determine the m similarities specifically includes:

[0085] Extract the user profile corresponding to each question and answer in the pre-selected question and answer database, construct multiple vector values ​​corresponding to each user profile, calculate the difference between each pair of multiple vector values, determine the vectors with differences less than a preset threshold as a group of vectors, determine all vectors in a group of vectors as similar user profiles, calculate the average value of a group of vectors to obtain the average vector, determine the average vector as the merged user profile, traverse all groups of vectors to obtain m user profiles, extract the first vector of the first user profile, calculate the difference between the first vector and the m average vectors of the m user profiles to obtain m difference vectors, calculate the average value of the element values ​​of each of the m difference vectors to obtain the m similarities.

[0086] For example, the terminal device processes the attribute information of the user seeking consultation according to the rules to obtain the first user information, and then integrates the first user information into the question and answer information database.

[0087] For example, the terminal device recalling n similar answers within the first cluster of question-answering databases specifically includes:

[0088] The terminal device extracts the text information of the user to be consulted, performs word segmentation and central sentence processing on the text information to obtain x1 keywords and corresponding x1 central sentences. Based on the x1 central sentences, it queries the question-and-answer database of the first cluster for x2 expression patterns with the same meaning as each central sentence. It randomly selects an expression pattern from the x2 expression patterns corresponding to each central sentence to obtain x1 expression sentences. The x1 expression sentences are arranged in the order of x1 keywords to obtain a voice data. Similarly, multiple voice data are obtained. The multiple voice data are matched for fluency to obtain the first voice data with the highest fluency. The first voice data is determined as the playback voice data of the text information.

[0089] All relevant content in each scenario involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and will not be repeated here.

[0090] This application also provides a computer-readable storage medium storing a computer program that, when run on a network device,... Figure 5 The method flow shown is thus implemented.

[0091] This application also provides a computer program product, which, when run on a terminal, provides a method for... Figure 5 The method flow shown is thus implemented.

[0092] The above primarily describes the solutions of the embodiments of this application from the perspective of the method execution process. It is understood that, in order to achieve the above functions, the electronic device includes the corresponding hardware structure and / or software template for executing each function. Those skilled in the art should readily recognize that, in conjunction with the units and algorithm steps of the various examples described in the embodiments provided herein, this application can be implemented in hardware or a combination of hardware and computer software. Whether a function is executed by hardware or by computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0093] This application embodiment can divide the electronic device into functional units according to the above method example. For example, each function can be divided into a separate functional unit, or two or more functions can be integrated into one processing unit. The integrated unit can be implemented in hardware or as a software functional unit. It should be noted that the unit division in this application embodiment is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0094] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and templates involved are not necessarily essential to this application.

[0095] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0096] In the several embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical or other forms.

[0097] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0098] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0099] If the integrated units described above are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0100] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, which may include: flash drive, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

Claims

1. A question-answering method based on unsupervised learning and fusion of customer profiles, characterized in that, The method includes the following steps: The terminal device obtains the attribute information of the user to be consulted, performs attribute processing on the attribute information, and then performs cluster analysis to determine the first cluster of the user. The terminal device integrates the historical question and answer information of all users belonging to the same cluster into the question and answer database of the first cluster; The terminal device recalls n similar answers from the first cluster of question-and-answer databases, sorts and evaluates the scores of the n answers to obtain n scores, and determines the first answer corresponding to the highest score among the n scores as the answer for the user to be consulted. The terminal device integrates the historical question-and-answer information of all users belonging to the same cluster into a question-and-answer database for the first cluster, specifically including: The terminal device queries the corresponding cluster from all users' historical questions and answers, and determines the historical answers of the same cluster as the first cluster as the pre-selected question and answer library. It extracts the user profile corresponding to each question and answer in the pre-selected question and answer library, merges similar user profiles to obtain m merged user profiles, compares the m user profiles with the first user profile of the user to be consulted to determine m similarities, selects m1 similarities with greater than the similarity threshold from the m similarities, and retains the historical answers of the users corresponding to the m1 user profiles in the pre-selected question and answer library to obtain the question and answer library of the first cluster. Extract the user profile corresponding to each question and answer in the pre-selected question and answer database, construct multiple vector values ​​corresponding to each user profile, calculate the difference between each pair of multiple vector values, determine the vectors with differences less than a preset threshold as a group of vectors, determine all vectors in a group of vectors as similar user profiles, calculate the average value of a group of vectors to obtain the average vector, determine the average vector as the merged user profile, traverse all groups of vectors to obtain m user profiles, extract the first vector of the first user profile, calculate the difference between the first vector and the m average vectors of the m user profiles to obtain m difference vectors, calculate the average value of the element values ​​of each of the m difference vectors to obtain the m similarities.

2. The method according to claim 1, characterized in that, The method further includes: The terminal device processes the attribute information of the user seeking consultation according to the rules to obtain the first user information, and then integrates the first user information into the question and answer information database.

3. The method according to claim 1, characterized in that, The terminal device recalls n similar answers within the first cluster of question-and-answer databases, specifically including: The terminal device extracts the text information of the user to be consulted, performs word segmentation and central sentence processing on the text information to obtain x1 keywords and corresponding x1 central sentences. Based on the x1 central sentences, it queries the question-and-answer database of the first cluster for x2 expression patterns with the same meaning as each central sentence. It randomly selects an expression pattern from the x2 expression patterns corresponding to each central sentence to obtain x1 expression sentences. The x1 expression sentences are arranged in the order of x1 keywords to obtain a voice data. Similarly, multiple voice data are obtained. The multiple voice data are matched for fluency to obtain the first voice data with the highest fluency. The first voice data is determined as the playback voice data of the text information.

4. The method according to claim 1, characterized in that, The process of sorting and evaluating n answers to obtain n scores specifically includes: Based on the n answers, sbert is used to match them with the actual question to obtain n scores.

5. A question-answering system based on unsupervised learning and fusion of customer profiles, characterized in that, The system includes: The acquisition unit is used to acquire the attribute information of the user to be consulted, perform attribute processing on the attribute information, and then perform cluster analysis to determine the first cluster of the user. The processing unit is used to integrate the historical question and answer information of all users belonging to the same cluster into a question and answer database for the first cluster, recall n similar answers in the question and answer database for the first cluster, sort and evaluate the scores of the n answers to obtain n scores, and determine the first answer corresponding to the highest score among the n scores as the answer for the user to be consulted. The processing unit is specifically used to query the corresponding cluster from the historical questions and answers of all users, determine the historical answers of the same cluster as the first cluster as the pre-selected question and answer library, extract the user profile corresponding to each question and answer in the pre-selected question and answer library, merge similar user profiles to obtain m merged user profiles, compare the m user profiles with the first user profile of the user to be consulted to determine m similarities, select m1 similarities with greater than the similarity threshold from the m similarities, and retain the historical answers of the users corresponding to the m1 user profiles in the pre-selected question and answer library to obtain the question and answer library of the first cluster. Extract the user profile corresponding to each question and answer in the pre-selected question and answer database, construct multiple vector values ​​corresponding to each user profile, calculate the difference between each pair of multiple vector values, determine the vectors with differences less than a preset threshold as a group of vectors, determine all vectors in a group of vectors as similar user profiles, calculate the average value of a group of vectors to obtain the average vector, determine the average vector as the merged user profile, traverse all groups of vectors to obtain m user profiles, extract the first vector of the first user profile, calculate the difference between the first vector and the m average vectors of the m user profiles to obtain m difference vectors, calculate the average value of the element values ​​of each of the m difference vectors to obtain the m similarities.

6. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, said one or more programs being stored in the memory and configured to be executed by the processor, said programs comprising instructions for performing the steps of the method as claimed in any one of claims 1-4.

7. A computer-readable storage medium storing a computer program that, when run on a user device, performs the method as described in any one of claims 1-4.