Intelligent card generation method based on data matching and related device
By filtering and evaluating the semantic and preference matching scores of candidate fields in a dialogue system, smart cards are generated, which solves the problem of inaccurate field selection in financial data queries, improves the matching degree between data content and user needs, and enhances the user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN XISHIMA DATA TECH CO LTD
- Filing Date
- 2026-04-23
- Publication Date
- 2026-06-23
Smart Images

Figure CN122086973B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, specifically to a smart card generation method and related apparatus based on data matching. Background Technology
[0002] Currently, traditional search engines can no longer meet users' diverse data query needs, and intelligent dialogue systems are gradually becoming the mainstream query tools. In highly specialized fields such as financial data queries, some users struggle to accurately describe their query requirements. While existing dialogue systems have mechanisms to guide users to correct their questions through interactive smart cards, their field selection still lacks accuracy. This results in the data content of the output smart cards not matching the user's needs, failing to properly guide the user to correct their question and negatively impacting the user experience. Summary of the Invention
[0003] This application provides a smart card generation method and related apparatus based on data matching, aiming to improve the accuracy of field selection, enhance the matching degree between the data content carried by the smart card and user needs, and optimize the user experience.
[0004] In a first aspect, embodiments of this application provide a smart card generation method based on data matching, including:
[0005] In response to the target user's target question, a first vector is obtained, which is used to characterize the semantic information of the target indicator in the target question;
[0006] Multiple candidate fields are selected, each candidate field corresponds to a second vector, and the similarity between the second vector and the first vector is greater than a first preset threshold.
[0007] Calculate the semantic matching score and preference matching score for each candidate field. The semantic matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of objective semantics of the text, and the preference matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of subjective preferences of the user.
[0008] Based on the semantic matching score and the preference matching score, a target field is selected from the plurality of candidate fields;
[0009] Generate a smart card, which is used to carry the associated data of the target field.
[0010] Secondly, embodiments of this application provide a smart card generation device based on data matching, comprising:
[0011] The acquisition unit is configured to acquire a first vector in response to a target user's target question, wherein the first vector is used to characterize the semantic information of the target indicator in the target question;
[0012] A filtering unit is used to filter out multiple candidate fields, wherein the candidate fields correspond to a second vector, and the similarity between the second vector and the first vector is greater than a first preset threshold.
[0013] The calculation unit is used to calculate the semantic matching score and preference matching score of each candidate field. The semantic matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of objective semantics of the text, and the preference matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of subjective preferences of the user.
[0014] A selection unit is used to select a target field from the plurality of candidate fields based on the semantic matching score and the preference matching score;
[0015] A generation unit is used to generate smart cards, which are used to carry the associated data of the target field.
[0016] Thirdly, embodiments of this application provide an electronic device including a processor, a memory, and one or more programs, the one or more programs being stored in the memory and configured to be executed by the processor, the programs including instructions for performing steps in the method as described in the first aspect of embodiments of this application.
[0017] Fourthly, embodiments of this application provide a computer-readable storage medium having a computer program or instructions stored thereon, wherein the computer program or instructions, when executed by a processor, implement the steps of the method described in the first aspect of embodiments of this application.
[0018] As can be seen, in this embodiment, candidate fields are selected through vector similarity matching. After calculating the semantic matching score and preference matching score for each candidate field, the target field is selected from the candidate fields based on these scores, thereby generating a smart card to carry the associated data of the target field. The semantic matching score characterizes the degree of matching between the candidate field and user needs at the objective semantic level, while the preference matching score characterizes the degree of matching between the candidate field and user needs at the subjective preference level. Thus, by quantitatively evaluating the degree of matching between candidate fields and user needs in both objective semantic and subjective dimensions, the target field is comprehensively selected to generate the smart card, improving the accuracy of field selection and the degree of matching between the data content carried by the smart card and user needs, thereby optimizing the user experience. Attached Figure Description
[0019] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0020] Figure 1 This is a structural block diagram of a dialogue system provided in an embodiment of this application;
[0021] Figure 2 This is a structural block diagram of another dialogue system provided in an embodiment of this application;
[0022] Figure 3 This is a flowchart illustrating a smart card generation method based on data matching provided in an embodiment of this application;
[0023] Figure 4 This is a structural block diagram of a smart card generation device based on data matching provided in an embodiment of this application;
[0024] Figure 5 This is a structural block diagram of another smart card generation device based on data matching provided in the embodiments of this application;
[0025] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0026] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort belong to the protection set of the present application.
[0027] The terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses.
[0028] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0029] In highly specialized fields such as financial data queries, different users possess varying levels of expertise, leading to significant differences in the ambiguity of their input questions. For example, regarding queries targeting the China Economic and Financial Research Database (CSMAR), some users struggle to accurately describe their needs. Existing dialogue systems select fields based solely on semantic similarity. However, many fields with similar names exist across different database tables, but their meanings differ across tables. Consequently, existing dialogue systems lack accuracy in field selection, resulting in a mismatch between the data content of the interactive smart cards and user requirements. This not only fails to guide users in revising their queries but may also mislead them, negatively impacting the user experience.
[0030] To address the aforementioned issues, embodiments of this application provide a smart card generation method and related apparatus based on data matching.
[0031] The system architecture involved in the embodiments of this application is described below.
[0032] In one embodiment, such as Figure 1 As shown, the dialogue system 10 includes a first electronic device 11 and a second electronic device 12, which are communicatively connected. The first electronic device 11 is used to collect a target question input by a target user and send the target question to the second electronic device 12, which then executes the smart card generation method based on data matching as described in the embodiments of this application.
[0033] In another embodiment, such as Figure 2 As shown, the dialogue system 10 includes only a first electronic device 11, which is used to collect the target question input by the target user and to execute the smart card generation method based on data matching as described in the embodiments of this application.
[0034] The following describes a smart card generation method based on data matching provided by an embodiment of this application.
[0035] Please see Figure 3 , Figure 3This is a flowchart illustrating a smart card generation method based on data matching provided in an embodiment of this application. Figure 3 As shown, the method includes:
[0036] S301, in response to the target user's target question, obtain the first vector.
[0037] The first vector is used to represent the semantic information of the target indicator in the target question. For example, if the target question is "Help me find the ROE of a listed company", then the target indicator is ROE, i.e., return on equity; if the target question is "Help me find the profitability of a listed company", then the target indicator is profitability. After extracting the target indicator, a pre-trained semantic encoding model is called to encode the text content of the target indicator into a semantic feature vector of a preset dimension, i.e., the first vector.
[0038] S302, filter out multiple candidate fields.
[0039] Wherein, the candidate field corresponds to the second vector, the second vector is used to represent the semantic information of the corresponding candidate field, and the similarity between the second vector and the first vector is greater than a first preset threshold, that is, fields with a semantic similarity to the target indicator greater than the first preset threshold are selected from the database as candidate fields.
[0040] Furthermore, the associated data for each selected candidate field is obtained, including field source, data frequency, etc., which together constitute the information body of that candidate field. Taking the target question "Help me find the ROE of listed companies" as an example, the information body of the selected candidate fields includes: Financial Indicator Analysis - ROE - Quarterly Frequency, US Stock Financial Indicator Analysis - ROE - Quarterly Frequency, Accounting Information Quality - ROE - Annual Frequency. The first item is the field source of the candidate field, the second item is the name of the candidate field in the field source, and the third item is the data frequency of the candidate field in the field source.
[0041] S303, calculate the semantic matching score and preference matching score for each candidate field.
[0042] The semantic matching score characterizes the degree to which the candidate field matches the user's needs at the objective semantic level of the text, while the preference matching score characterizes the degree to which the candidate field matches the user's needs at the subjective preference level. The semantic matching score is relatively objective, unaffected by individual user preferences; different users asking the same question will receive the same semantic matching score. The preference matching score, however, is relatively subjective, influenced by individual user preferences, and focuses more on whether the candidate field aligns with the user's subjective preferences within the context of the target question.
[0043] S304, Based on the semantic matching score and the preference matching score, select the target field from the plurality of candidate fields.
[0044] In some embodiments, a comprehensive matching score can be obtained by weighting the semantic matching score and the preference matching score, and the target field can be selected according to the comprehensive matching score. In other embodiments, it can also be directly determined whether the semantic matching score and preference matching score of the candidate field meet a preset score threshold, thereby filtering the target field.
[0045] S305 generates smart cards.
[0046] The smart card is used to carry the associated data of the target field. The associated data includes basic data such as the field name, field value, name of the database table to which it belongs, set of data entities to which it belongs, and data frequency of the target field, as well as interactive functional controls such as data download and data preview.
[0047] As can be seen, in this embodiment, candidate fields are selected through vector similarity matching. After calculating the semantic matching score and preference matching score for each candidate field, the target field is selected from the candidate fields based on these scores, thereby generating a smart card to carry the associated data of the target field. The semantic matching score characterizes the degree of matching between the candidate field and user needs at the objective semantic level, while the preference matching score characterizes the degree of matching between the candidate field and user needs at the subjective preference level. Thus, by quantitatively evaluating the degree of matching between candidate fields and user needs in both objective semantic and subjective dimensions, the target field is comprehensively selected to generate the smart card, improving the accuracy of field selection and the degree of matching between the data content carried by the smart card and user needs, thereby optimizing the user experience.
[0048] In one possible example, calculating the semantic matching score and preference matching score for each candidate field includes: for each candidate field, performing the following operations sequentially: calculating a first topic matching score and a second topic matching score for the candidate field, wherein the first topic matching score characterizes the degree of matching between the research topic of the data table to which the candidate field belongs and the research topic pointed to by the text semantics of the target question, and the second topic matching score characterizes the degree of matching between the research topic of the data table to which the candidate field belongs and the preference topic of the target user; calculating a first frequency matching score and a second frequency matching score for the candidate field, wherein the first frequency matching score characterizes the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second ... The score is used to characterize the degree of matching between the data frequency of the candidate field in its respective data table and the preference frequency of the target user; a first entity matching score and a second entity matching score are calculated for the candidate field, wherein the first entity matching score is used to characterize the degree of matching between the data entity set to which the candidate field belongs and the data entity set pointed to by the text semantics of the target question, and the second entity matching score is used to characterize the degree of matching between the data entity set to which the candidate field belongs and the preference entity set of the target user; the semantic matching score of the candidate field is calculated based on the first topic matching score, the first frequency matching score, and the first entity matching score; the preference matching score of the candidate field is calculated based on the second topic matching score, the second frequency matching score, and the second entity matching score.
[0049] In the field of financial data research, research topics can anchor the user's research direction and lock in the business affiliation of target indicators. For example, research topics include corporate profitability analysis, macroeconomic trend judgment, and industry debt repayment capacity comparison. By evaluating the topic matching score of candidate fields, fields that fit the user's core research direction can be selected, ensuring the basic adaptability of the final selected fields to the user's needs in the core research direction.
[0050] In the field of financial data research, data frequency can distinguish user data usage scenarios and match the timeliness of target indicators, such as daily market data, monthly operating data, quarterly financial data, and annual audit data. By evaluating the data frequency score of candidate fields, it is possible to accurately match users' actual research needs for data statistical cycles and timeliness of use, and avoid the inability of selected fields to be suitable for the research scenario due to frequency mismatch.
[0051] In the field of financial data research, the scope of data entities can locate the research object and its coverage area that the user query is pointing to, such as a specific industry, a specific sector, a specific market, or a specific company. By evaluating the entity matching score of the candidate fields, it can be ensured that the scope of data entities corresponding to the candidate fields is consistent with the boundary of the research object that the user is interested in.
[0052] In this embodiment, the semantic matching score and preference matching score of candidate fields are quantitatively evaluated through three dimensions: research topic, data frequency, and data entity scope. Specifically, for the research topic dimension, a first topic matching score is obtained by quantifying the degree of matching between the research topic of the data table to which the candidate field belongs and the research topic pointed to by the textual semantics of the target question; a second topic matching score is obtained by quantifying the degree of matching between the research topic of the data table to which the candidate field belongs and the preference topic of the target user. For the data frequency dimension, a first frequency matching score is obtained by quantifying the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the textual semantics of the target question; a second frequency matching score is obtained by quantifying the degree of matching between the data frequency of the candidate field in its data table and the preference frequency of the target user. For the data entity scope dimension, a first entity matching score is obtained by quantifying the degree of matching between the set of data entities to which the candidate field belongs and the set of data entities pointed to by the textual semantics of the target question; a second entity matching score is obtained by quantifying the degree of matching between the set of data entities to which the candidate field belongs and the set of preference entities of the target user. Finally, the semantic matching score of the candidate field is obtained by weighted averaging the first topic matching score, the first frequency matching score, and the first entity matching score. Similarly, the preference matching score of the candidate field is obtained by weighted averaging the second topic matching score, the second frequency matching score, and the second entity matching score. When performing the weighted averaging calculation, the weight coefficients for each dimension can be preset values or flexibly set according to actual conditions.
[0053] As can be seen, in this example, the semantic matching degree and preference matching degree of candidate fields are scored by three dimensions: research topic, data frequency and data entity range. The matching evaluation logic is refined, providing a refined and multi-dimensional quantitative basis for the subsequent selection of target fields, and improving the matching degree between the final selected target fields and user needs.
[0054] In one possible example, calculating the first topic matching score and the second topic matching score of the candidate field includes: obtaining the table name of the data table to which the candidate field belongs, and vectorizing it to obtain a first topic vector; calling a large model to perform topic inference on the data table to which the candidate field belongs, obtaining the research topic of the data table after inference, and vectorizing it to obtain a second topic vector; calling a large model to perform topic inference on the target question, obtaining the research topic of the target question after inference, and vectorizing it to obtain a third topic vector; obtaining the preferred topic of the target user, and vectorizing it to obtain a fourth topic vector; calculating a first similarity between the first topic vector and the third topic vector; calculating a second similarity between the second topic vector and the third topic vector; calculating the first topic matching score of the candidate field based on the first similarity and the second similarity; calculating a third similarity between the first topic vector and the fourth topic vector; calculating a fourth similarity between the second topic vector and the fourth topic vector; and calculating the second topic matching score of the candidate field based on the third similarity and the fourth similarity.
[0055] Among them, the matching of research topics essentially falls under the category of semantic matching. In the embodiments of this application, the research topics of candidate fields, the research topics to which the target question points at the objective semantic level of the text, and the research topics to which the target question points at the user's subjective preference dimension are first determined. Then, each research topic is mapped to a semantic vector of the same dimension. Finally, the vector similarity is calculated to characterize the degree of matching of research topics.
[0056] In this context, the names of database tables are typically determined by operations personnel based on the characteristics of the table data or by using prescribed topic tags. These names can reflect the research direction of the corresponding database table to some extent. In this example, the table name of the candidate field is determined as the research topic for that candidate field, and a first topic vector is obtained through quantization. For instance, taking the target question "Help me query the ROE of listed companies" as an example, if the information body of the currently processed candidate field is "Financial Indicator Analysis - ROE - Quarterly Frequency," then the table name "Financial Indicator Analysis" is determined as the corresponding research topic, and it is vectorized to obtain the first topic vector. Further, the larger model is invoked to perform topic inference based on the specific information of the candidate field's table, obtaining the research topic of the data table obtained through inference, and then quantized to obtain the second topic vector. For instance, taking the target question "Help me query the ROE of listed companies" as an example, if the information body of the currently processed candidate field is "Financial Indicator Analysis - ROE - Quarterly Frequency," then the specific information of the data table "Financial Indicator Analysis" is obtained, the larger model is invoked to perform topic inference on this data table, obtaining the research topic obtained through inference, such as "Summary Analysis of Financial Indicators of Listed Companies," and it is vectorized to obtain the second topic vector. In this way, the topics directly represented by the table name and the topics obtained by the large model inference are both used as the research topics of the candidate field to participate in the subsequent matching calculation, realizing two-dimensional cross-validation, avoiding the judgment bias caused by a single basis, and making the calculation results more consistent with the real matching results.
[0057] This process involves using a large model to perform topic reasoning on the target question, determining the reasoned content as the research topic that the target question points to at the objective semantic level of the text, and then quantifying it to obtain a third topic vector. For example, taking the target question as "Help me find the ROE of listed companies," the large model is used to perform topic reasoning to obtain the research topic derived from the reasoning, such as "Analysis of ROE, a financial indicator of listed companies," which is then vectorized to obtain a third topic vector.
[0058] The target user's preferred topics are predicted based on their behavioral characteristics. Specifically, by extracting the target user's field query and download records, topic inference is performed on the queried and downloaded fields to obtain the topics the target user has already researched. The burst intensity of each topic is calculated, and a predetermined number of topics are selected in descending order of burst intensity. The selected topics are then vectorized, and the similarity between each topic and the target question is calculated. The topic with the highest similarity is identified as the target user's preferred topic, and it is vectorized to obtain the fourth topic vector. The burst intensity of a topic refers to the magnitude of its recent sudden appearance and surge in quantity, representing the user's recent level of attention to the topic. The time range corresponding to "recent" can be flexibly set according to needs, such as within the last month; no single limitation is imposed here.
[0059] Through the above embodiments, the research topics of candidate fields have been mapped to a first topic vector and a second topic vector, the research topic of the target question has been mapped to a third topic vector, and the preferred topics of the target user have been mapped to a fourth topic vector. By calculating the first similarity between the first and third topic vectors, and the second similarity between the second and third topic vectors, and then performing a weighted average of the first and second similarities, a first topic matching score is obtained, representing the degree of matching between the research topic of the data table to which the candidate field belongs and the research topic pointed to by the textual semantics of the target question. Similarly, by calculating the third similarity between the first and fourth topic vectors, and the fourth similarity between the second and fourth topic vectors, and then performing a weighted average of the third and fourth similarities, a second topic matching score is obtained, representing the degree of matching between the research topic of the data table to which the candidate field belongs and the preferred topics of the target user. The weighting coefficients in the above weighted average calculation can be preset values or flexibly set based on actual needs; no single limitation is imposed here.
[0060] As can be seen, in this example, the research topic of the candidate field is represented by a dual approach that combines directly selecting the table name with the topic of the large model inference data table. The large model extracts the topic of the target question and obtains the user's preferred topic, which is then mapped to the same data dimension. The topic matching score is obtained by fusion calculation of multiple sets of vector similarity. This approach makes up for the matching bias caused by the non-standard user question, takes into account both the objective semantic level of the text and the subjective preference level of the user, and improves the accuracy of topic matching score calculation.
[0061] In one possible example, calculating the first frequency matching score and the second frequency matching score of the candidate field includes: obtaining the first data frequency of the candidate field in its respective data table; determining whether there is a first keyword in the target question used to indicate the data frequency; if so, determining the data frequency indicated by the first keyword as the target data frequency; assigning values to the first data frequency and the target data frequency respectively, and calculating the target frequency matching score based on the difference between the assigned values; determining that both the first frequency matching score and the second frequency matching score are the target frequency matching scores; if not, determining the public habitual frequency corresponding to the candidate field as the second data frequency; assigning values to the first data frequency and the second data frequency respectively, and calculating the first frequency matching score based on the difference between the assigned values; and obtaining the preference frequency of the target user; assigning values to the first data frequency and the preference frequency respectively, and calculating the second frequency matching score based on the difference between the assigned values.
[0062] In the field of financial data research, data frequencies generally include milliseconds, seconds, minutes, hours, days, months, quarters, and years. In this application embodiment, the first data frequency corresponding to the candidate field and the data frequency required by the user are mapped to the same data dimension and calculated by preset assignment rules and preset calculation rules.
[0063] The preset assignment rule refers to assigning numerical values to data frequencies in ascending order of time granularity. In this example, the preset assignment rule assigns the data frequencies of milliseconds, seconds, minutes, hours, days, months, quarters, and years to 1, 2, 3, 4, 5, 6, 7, and 8 respectively.
[0064] The preset calculation rules include: calculating a frequency matching score based on the numerical difference between the data frequency of the user's demand after assignment and the first data frequency. In this example, the frequency matching score is calculated using the following formula:
[0065]
[0066] In the above calculation formula, S1 is the frequency matching score, with a value range of [0, 1], F0 is the data frequency required by the user, F1 is the first data frequency, and Max_D is the maximum theoretical absolute difference of the current frequency assignment system. In this example, the value of Max_D is 7. It can be seen that the closer the data frequency required by the user matches the first data frequency, the smaller the numerical difference after assignment, and the higher the frequency matching score.
[0067] For example, taking the information body "Financial Indicator Analysis - ROE - Quarterly Frequency" as an example, the candidate field "ROE" has the first data frequency of the quarterly frequency in its corresponding data table "Financial Indicator Analysis", and is assigned a value of 7 according to the preset assignment rules. Further, the data frequency required by the user is obtained.
[0068] In this scenario, when the target question contains keywords indicating data frequency, it signifies that the user has clearly defined the data frequency they require. For example, if the target question is "Help me find the annual ROE of a listed company," then the keyword indicating the data frequency is "annual," which is determined as the target data frequency and assigned a value of 8. Substituting this into the above calculation formula yields a target frequency matching score of 0.857. At this point, the user's required frequency is definite, and there is no need to predict the user's required frequency from an objective or subjective perspective. Therefore, both the first frequency matching score and the second frequency matching score are determined to be the target frequency matching score, i.e., 0.857.
[0069] When there are no keywords in the target question to indicate the data frequency, the public's commonly used frequency corresponding to the candidate field is determined as the second data frequency. From the relatively objective perspective of the public, the frequency needs of users in the current context are evaluated. For example, when most users of the database are studying the ROE field, they generally choose the quarterly frequency. Therefore, the second data frequency is determined to be the quarterly frequency and assigned a value of 7. Substituting it into the above calculation formula, the first frequency matching score is 1.
[0070] In this process, when no keywords indicating data frequency exist in the target question, the target user's preference frequency is obtained. Specifically, based on the preference topics determined in the above embodiments, a large model is invoked to infer the common data frequencies of the preference topics, obtaining the inferred preference topic frequency. The data frequencies used by the target user when querying fields associated with the preference topic are extracted, and the burst strength of each data frequency is calculated. A preset number of data frequencies are selected in descending order of burst strength. If a data frequency that is the same as the preference topic frequency exists, it is determined as the target user's preference frequency. If no data frequency that is the same as the preference topic frequency exists, the data frequency with the highest burst strength is determined as the target user's preference frequency. For example, if the target user's preference frequency is the annual frequency, it is assigned a value of 8 according to the preset assignment rule. Substituting this into the above calculation formula, the second frequency matching score is 0.857.
[0071] As can be seen in this example, by determining whether the target question contains keywords used to indicate data frequency, the frequency of the user's needs in this question context is obtained or predicted. This frequency is then assigned and calculated according to preset rules with the first data frequency of the candidate field in its respective data table to obtain the frequency matching score for that candidate field. This approach can accurately respond to the user's explicit frequency needs, and also predict the user's frequency needs from both objective and subjective dimensions when the needs are ambiguous, improving the rationality of the frequency matching score calculation and its adaptability to different scenarios.
[0072] In one possible example, calculating the first entity matching score and the second entity matching score of the candidate field includes: obtaining a first data entity set to which the candidate field belongs; determining whether there is a second keyword in the target question that indicates the data entity set; if so, determining that the data entity set indicated by the second keyword is the target data entity set; calculating the target vector similarity between each entity in the first data entity set and each entity in the target data entity set; calculating the target entity matching score based on the target vector similarity; determining that both the first entity matching score and the second entity matching score are the target entity matching scores; if not, calling a large model to perform data entity reasoning on the target question to obtain a second data entity set obtained through reasoning; calculating the first vector similarity between each entity in the first data entity set and each entity in the second data entity set; calculating the first entity matching score based on the first vector similarity; and obtaining the target user's preferred entity set; calculating the second vector similarity between each entity in the first data entity set and each entity in the preferred entity set; and calculating the second entity matching score based on the second vector similarity.
[0073] In this case, fields with the same name distributed across different database tables may correspond to different data entities. For example, the ROE field in financial indicator analysis and the ROE field in US stock financial indicator analysis. In this example, the former corresponds to certain listed companies in the A-share market, while the latter corresponds to certain listed companies in the US stock market. It is evident that the data entity sets corresponding to the two are different. In this embodiment, the first set of data entities to which the candidate field belongs and the set of data entities required by the user are first determined. Then, the vector similarity between each entity in the first set of data entities and each entity in the set of data entities required by the user is calculated. Finally, the entity matching score of the candidate field is calculated based on the vector similarity.
[0074] For each candidate field, its data entities are extracted to form a list. After deduplication and sorting, the first set of data entities corresponding to that candidate field is obtained. For example, assuming the data table "Financial Indicator Analysis" describes the scope of the Science and Technology Innovation Board, then the set of data entities corresponding to the field "ROE" in this data table would be {688001, 688002, ...}. Assuming the data table "US Stock Financial Indicator Analysis" describes the scope of US stocks, then the set of data entities corresponding to the field "ROE" in this data table would be {MSFT, AAPL, NVDA, ...}. It is understood that the presentation format of the elements in the data entity set is not uniquely limited; for example, in this example, the data entity is presented as its corresponding stock code.
[0075] Specifically, when the target question contains keywords indicating the set of data entities, it indicates that the user has clearly defined the set of data entities they require. For example, if the target question is "Help me find the ROE of each listed company on the Science and Technology Innovation Board," then the keywords indicating the set of data entities—namely, the listed companies on the Science and Technology Innovation Board—can be directly identified as the target set of data entities, and the target entity matching score can then be calculated. At this point, the set of data entities required by the user is definite, and there is no need to predict the set of data entities required by the user from an objective or subjective perspective. Therefore, both the first entity matching score and the second entity matching score are determined to be the target entity matching score.
[0076] In this process, when the target question lacks keywords indicating the set of data entities, a large model is invoked to perform data entity reasoning on the target question, obtaining a second set of data entities. This second set is used to objectively assess the user's entity expectations in the current context. For example, if the target question is "Help me find the ROE of listed companies," and this question does not contain keywords such as specific markets, industries, or sectors to indicate the set of data entities, based on the research conventions of the CSMAR database, the large model is invoked to infer that the second set of data entities is all listed companies in the A-share market, and then the first entity matching score is calculated.
[0077] Specifically, when no keywords indicating the data entity set exist in the target question, the target user's preferred entity set is obtained. Specifically, based on the target user's field query and download records, the data entities to which the queried and downloaded fields belong are extracted. The burst strength of each data entity is calculated, and data entities with burst strength greater than a preset strength are selected to obtain the target user's preferred entity set. For example, if the burst strength determines that the user has recently frequently queried financial data of manufacturing-related companies in the A-share market, then the target user's preferred entity set is determined to be listed companies in the manufacturing sector in the A-share market, and a second entity matching score is calculated.
[0078] In calculating the entity matching score, the vector similarity between entities in the two entity sets is first calculated. Then, matching entity pairs are selected based on the vector similarity, and the entity matching score is calculated based on the number of matching entity pairs. For example, if set A includes {A1, A2, A3, ...} and set B includes {B1, B2, B3, ...}, the vector similarity between each element in set A and each element in set B is calculated to select matching entity pairs. The entity matching score is then calculated using the following formula:
[0079]
[0080] Where S2 is the entity matching score, Q is the number of matching entity pairs, M is the number of elements in set A, N is the number of elements in set B, and W is a flexibly set weight coefficient. Assuming set A represents the set of data entities required by the user, and set B represents the first set of data entities to which the candidate field belongs, a larger W value encourages the selection of data entity sets that overlap with the user's requirements.
[0081] In some embodiments, the matching entity pair refers to two entities that belong to different sets, have a vector similarity greater than a preset threshold, and represent the maximum similarity in each other's sets. For example, if the preset threshold is 0.7, and the vector similarity between entity A1 and entity B1 is 0.9, and the similarity between entity A1 and other entities in set B is less than 0.9, and the similarity between entity B1 and other entities in set A is less than 0.9, then entity A1 and entity B1 are determined to constitute a matching entity pair.
[0082] In some embodiments, the matching entity pair refers to two entities that belong to different sets, have a vector similarity greater than a preset threshold, and represent the maximum similarity value in one of the sets. For example, if the preset threshold is 0.7, and the vector similarity between entity A1 and entity B1 is 0.9, and the similarity between entity A1 and other entities in set B is less than 0.9, or the similarity between entity B1 and other entities in set A is less than 0.9, then entity A1 and entity B1 are determined to constitute a matching entity pair.
[0083] In some embodiments, the matching entity pair refers to two entities that belong to different sets and whose vector similarity to each other is greater than a preset threshold. For example, if the preset threshold is 0.7, and the vector similarity between entity A1 and entity B1 is 0.9, then entity A1 and entity B1 are determined to constitute a matching entity pair.
[0084] It is understood that the specific implementation methods for filtering matching entity pairs based on vector similarity described above can be flexibly adjusted and selected according to the actual application scenario, and this application does not limit them to a single method.
[0085] As can be seen in this example, by determining whether the target question contains keywords that indicate a set of data entities, the set of data entities required by the user in this question context is obtained or predicted. This set is then matched with the first set of data entities to which the candidate field belongs based on vector similarity to obtain the entity matching score for that candidate field. This approach can accurately respond to the user's explicit entity needs, and also characterize the user's entity expectations from both objective and subjective dimensions when the needs are ambiguous, thus improving the rationality of the entity matching score calculation and its adaptability to different scenarios.
[0086] In one possible example, selecting target fields from the plurality of candidate fields based on the semantic matching score and the preference matching score includes: calculating the ambiguity score of the target question, the ambiguity score being used to characterize the degree of ambiguity of the target user's question requirement; calculating a first weight corresponding to the semantic matching score and a second weight corresponding to the preference matching score based on the ambiguity score; performing a weighted average operation on the semantic matching score and preference matching score of each candidate field based on the first weight and the second weight to obtain a comprehensive matching score; and selecting a preset number of target fields from the plurality of candidate fields in descending order of the comprehensive matching scores.
[0087] In financial data query applications, the more ambiguous the user's question, the less clear the user is about their true needs, and the more they desire directional guidance. In this case, more attention should be paid to preference matching and in-depth exploration of the user's personalized needs. Conversely, the more precise the user's question, the clearer the user is about their true needs, and the higher their requirements for semantic matching. In this case, more attention should be paid to semantic matching. Therefore, in this example, by calculating the fuzziness score of the target question, the first weight corresponding to the semantic matching score and the second weight corresponding to the preference matching score are further calculated. Finally, a weighted average is used to calculate the comprehensive matching score of each candidate field to filter out a preset number of target fields. The higher the fuzziness score of the target question, the more ambiguous the target question, the smaller the first weight and the larger the second weight. The preset number can be flexibly set according to the application scenario and user habits. For example, a default selection of 5 target fields balances recommendation accuracy and coverage.
[0088] As can be seen, in this example, by calculating the fuzziness score of the target question and then dynamically adjusting the weights, a differentiated emphasis on semantic matching and preference matching is achieved, which improves the rationality and accuracy of the target field selection, better meets the actual needs of users, and optimizes the user experience.
[0089] In one possible example, calculating the ambiguity score of the target problem includes: decomposing the target index in the target problem into orthogonal indices, wherein the orthogonal indices refer to multiple indices that do not overlap with each other; calculating the information entropy of the target problem based on the number of orthogonal indices to obtain the ambiguity score of the target problem.
[0090] In the field of financial data querying, there are many comprehensive indicators. These indicators are usually described by multiple metrics and can ultimately be broken down into multiple orthogonal indicators. Orthogonal indicators are those that do not overlap in information with each other. For example, profitability is a comprehensive indicator, composed of multiple independent indicators representing different aspects of profitability. It can be further broken down into orthogonal indicators such as gross profit margin, net profit margin, and return on assets (ROA). Each indicator represents profitability from dimensions such as product profitability, revenue profitability efficiency, and asset profitability level, with no information overlap or inclusion relationship between them. Correspondingly, return on equity (ROE) is an orthogonal indicator. Its core characteristic is that it is a single, independent, and non-overlapping single-dimensional indicator, which cannot be further broken down. Therefore, when the target indicator is ROE, the number of orthogonal indicators it can be decomposed into is 1. The comparison between profitability and return on equity shows that the more orthogonal indicators obtained after decomposing the target indicator, the less clear the target user is about the specific indicator they want, and the more vague the target question is.
[0091] Furthermore, this example calculates the information entropy of the target problem by using the number of orthogonal indices, thus quantifying the ambiguity score of the target problem. Information entropy is used to quantify the uncertainty of a random variable; for a discrete random variable X, its possible values are X1, X2, ..., X... n The corresponding probabilities are P(X1), P(X2), ..., P(X... n If the information entropy H(X) is then defined as:
[0092]
[0093] Applying the definition of information entropy above to this example application scenario, we get that n represents the number of orthogonal indices, P(X i Let represent the probability of the i-th orthogonal index, and H(X) represent the information entropy of the target problem, used to characterize the uncertainty of the target problem, i.e., the degree of ambiguity. In this example, the orthogonal indices are independent, non-overlapping single dimensions, and each index contributes equally to the demand. Therefore:
[0094]
[0095] After substituting into the simplified formula for information entropy, we get:
[0096]
[0097] Substituting the number of orthogonal indices, n, into the above formula, we calculate the information entropy of the target problem and obtain its fuzziness score. It can be seen that the more orthogonal indices obtained after decomposing the target indices, the greater the information entropy of the target problem, meaning the higher the fuzziness score and the more ambiguous the target problem.
[0098] As can be seen, in this example, by decomposing the target indicators in the target problem into orthogonal indicators, calculating the information entropy of the target problem based on the number of orthogonal indicators, and obtaining the fuzziness score of the target problem, the weights of semantic matching score and preference matching score are differentiated, accurately matching user needs and improving the accuracy of field selection.
[0099] For examples consistent with the above embodiments, please refer to... Figure 4 , Figure 4 This is a structural block diagram of a smart card generation device based on data matching provided in an embodiment of this application, such as... Figure 4 As shown, the smart card generation device 40 based on data matching includes: an acquisition unit 401, used to acquire a first vector in response to a target user's target question, the first vector being used to characterize the semantic information of a target indicator in the target question; a filtering unit 402, used to filter out multiple candidate fields, the candidate fields corresponding to a second vector, the similarity between the second vector and the first vector being greater than a first preset threshold; a calculation unit 403, used to calculate a semantic matching score and a preference matching score for each candidate field, the semantic matching score being used to characterize the degree of matching between the candidate field and the user's needs at the objective semantic level of the text, the preference matching score being used to characterize the degree of matching between the candidate field and the user's needs at the subjective preference level of the user; a selection unit 404, used to select a target field from the multiple candidate fields based on the semantic matching score and the preference matching score; and a generation unit 405, used to generate a smart card, the smart card being used to carry the associated data of the target field.
[0100] It is understood that since the method embodiments and the device embodiments are different presentations of the same technical concept, the content of the method embodiment section in this application should be adapted to the device embodiment section in a synchronous manner, and will not be repeated here.
[0101] When using integrated units, such as Figure 5 As shown, Figure 5 This is a structural block diagram of another smart card generation device based on data matching provided in this application embodiment. Figure 5The data-matching-based smart card generation device 40 includes a processing module 42 and a communication module 41. The processing module 42 controls and manages the operations of the data-matching-based smart card generation device, for example, executing the steps of the acquisition unit 401, calculation unit 402, selection unit 403, and generation unit 404, and / or other processes for performing the techniques described herein. The communication module 41 supports interaction between the data-matching-based smart card generation device and other devices. Figure 5 As shown, the smart card generation device based on data matching may further include a storage module 43, which is used to store the program code and data of the smart card generation device based on data matching.
[0102] All relevant content in each scenario involved in the above method embodiments can be referenced from the functional descriptions of the corresponding functional modules, and will not be repeated here. The above-mentioned smart card generation device 40 based on data matching can all perform the above-mentioned... Figure 3 The method for generating smart cards based on data matching is shown.
[0103] Based on the description of the above method and device embodiments, please refer to... Figure 6 , Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 6 The illustrated electronic device includes a memory 601, a processor 602, a communication interface 603, and a bus 604. The memory 601, processor 602, and communication interface 603 are interconnected via the bus 604. Specifically, the electronic device may refer to the first electronic device 11 or the second electronic device 12 in the above embodiments.
[0104] The memory 601 may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
[0105] The memory 601 can store programs. When the programs stored in the memory 601 are executed by the processor 602, the processor 602 and the communication interface 603 are used to execute the various steps of the data matching-based smart card generation method of the present application embodiments.
[0106] The processor 602 may be a general-purpose central processing unit (CPU), microprocessor, application specific integrated circuit (ASIC), graphics processing unit (GPU), or one or more integrated circuits, used to execute related programs to achieve the functions required by the units in the electronic device of this application embodiment, or to execute the smart card generation method based on data matching of this application method embodiment.
[0107] The processor 602 can also be an integrated circuit chip with signal processing capabilities. In implementation, each step of the data-matching-based smart card generation method of this application can be completed by the integrated logic circuits in the hardware of the processor 602 or by instructions in software form. The aforementioned processor 602 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in the memory 601. The processor 602 reads the information in the memory 601 and, in conjunction with its hardware, performs the functions required by the units included in the electronic device of this application embodiment, or executes the smart card generation method based on data matching of this application method embodiment.
[0108] Communication interface 603 uses transceiver devices, such as, but not limited to, transceivers, to enable communication between electronic devices and other devices or communication networks. For example, data can be acquired through communication interface 603.
[0109] Bus 604 may include a pathway for transmitting information between various components of an electronic device (e.g., memory 601, processor 602, communication interface 603).
[0110] It should be noted that, although Figure 6The illustrated electronic device only shows the memory 601, processor 602, and communication interface 603. However, those skilled in the art should understand that in specific implementations, the electronic device may also include other components necessary for normal operation. Furthermore, depending on specific needs, those skilled in the art should understand that the electronic device may also include hardware components for implementing other additional functions. Moreover, those skilled in the art should understand that the electronic device may only include the components necessary for implementing the embodiments of this application, and may not necessarily include... Figure 6 All the devices shown.
[0111] This application also provides a computer storage medium storing a computer program / instructions thereon, which, when executed by a processor, implements some or all of the steps of any of the methods described in the above method embodiments.
[0112] In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for example, the division of units is merely a logical functional division, and there may be other division methods in actual implementation; for example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between devices or units, and may be electrical, mechanical, or other forms.
[0113] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0114] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented, in whole or in part, as a computer program product. This computer program product includes one or more computer instructions. When these computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this application is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in or transmitted through a computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a read-only memory, or random access memory, or a magnetic medium, such as a floppy disk, hard disk, magnetic tape, magnetic disk, or an optical medium, such as a digital universal optical disc, or a semiconductor medium, such as a solid-state drive.
[0115] The above description is merely a specific implementation of the embodiments of this application, but the protection set of the embodiments of this application is not limited thereto. Any changes or substitutions within the technology set disclosed in the embodiments of this application should be covered within the protection set of the embodiments of this application. Therefore, the protection set of the embodiments of this application should be determined by the protection set of the claims.
[0116] The device embodiments described above are merely illustrative. The units and modules described as separate components may or may not be physically separate. Furthermore, some or all of the units and modules can be selected to achieve the purpose of this embodiment, depending on actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0117] While this application discloses the above information, it is not limited thereto. Any person skilled in the art can easily conceive of variations or substitutions without departing from the spirit and scope of this application, and various modifications and alterations can be made, including combinations of the different functions and implementation steps described above, as well as software and hardware implementation methods, all of which are within the protection scope of this application.
Claims
1. A smart card generation method based on data matching, characterized in that, include: In response to the target user's target question, a first vector is obtained, which is used to characterize the semantic information of the target indicator in the target question; Multiple candidate fields are selected, each candidate field corresponds to a second vector, and the similarity between the second vector and the first vector is greater than a first preset threshold. Calculate the semantic matching score and preference matching score for each candidate field. The semantic matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of objective semantics of the text, and the preference matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of subjective preferences of the user. Based on the semantic matching score and the preference matching score, a target field is selected from the plurality of candidate fields; Generate a smart card, which is used to carry the associated data of the target field; The calculation of the semantic matching score and preference matching score for each candidate field includes: for each candidate field, performing the following operations sequentially: calculating a first topic matching score and a second topic matching score for the candidate field, wherein the first topic matching score is used to characterize the degree of matching between the research topic of the data table to which the candidate field belongs and the research topic pointed to by the text semantics of the target question, and the second topic matching score is used to characterize the degree of matching between the research topic of the data table to which the candidate field belongs and the preference topic of the target user; calculating a first frequency matching score and a second frequency matching score for the candidate field, wherein the first frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the preference topic ... The matching degree between the data frequency of the candidate field in its respective data table and the preference frequency of the target user is characterized; a first entity matching score and a second entity matching score are calculated for the candidate field, wherein the first entity matching score is used to characterize the matching degree between the data entity set to which the candidate field belongs and the data entity set pointed to by the text semantics of the target question, and the second entity matching score is used to characterize the matching degree between the data entity set to which the candidate field belongs and the preference entity set of the target user; a semantic matching score for the candidate field is calculated based on the first topic matching score, the first frequency matching score, and the first entity matching score; and a preference matching score for the candidate field is calculated based on the second topic matching score, the second frequency matching score, and the second entity matching score. The calculation of the first frequency matching score and the second frequency matching score of the candidate field includes: obtaining the first data frequency of the candidate field in its respective data table; determining whether there is a first keyword in the target question used to indicate the data frequency; if so, determining the data frequency indicated by the first keyword as the target data frequency; assigning values to the first data frequency and the target data frequency respectively, and calculating the target frequency matching score based on the difference between the assigned values; determining that both the first frequency matching score and the second frequency matching score are the target frequency matching scores; if not, determining the public habitual frequency corresponding to the candidate field as the second data frequency; assigning values to the first data frequency and the second data frequency respectively, and calculating the first frequency matching score based on the difference between the assigned values; and obtaining the preference frequency of the target user; assigning values to the first data frequency and the preference frequency respectively, and calculating the second frequency matching score based on the difference between the assigned values.
2. The method according to claim 1, characterized in that, The calculation of the first topic matching score and the second topic matching score of the candidate field includes: Obtain the table name of the data table to which the candidate field belongs, and vectorize it to obtain the first topic vector; The large model is invoked to perform topic reasoning on the data table to which the candidate field belongs, and the research topic of the data table obtained by reasoning is obtained and vectorized to obtain the second topic vector. The large model is invoked to perform topic reasoning on the target problem, and the research topic of the target problem obtained through reasoning is obtained and vectorized to obtain the third topic vector. Obtain the target user's preferred topics and vectorize them to obtain a fourth topic vector; Calculate the first similarity between the first topic vector and the third topic vector; Calculate the second similarity between the second topic vector and the third topic vector; The first topic matching score of the candidate field is calculated based on the first similarity and the second similarity. Calculate the third similarity between the first topic vector and the fourth topic vector; Calculate the fourth similarity between the second topic vector and the fourth topic vector; The second topic matching score of the candidate field is calculated based on the third similarity and the fourth similarity.
3. The method according to claim 1, characterized in that, The calculation of the first entity matching score and the second entity matching score of the candidate field includes: Obtain the first set of data entities to which the candidate field belongs; Determine whether a second keyword exists in the target problem to indicate a set of data entities; If so, the set of data entities indicated by the second keyword is determined to be the target set of data entities; the target vector similarity between each entity in the first set of data entities and each entity in the target set of data entities is calculated; based on the target vector similarity, the target entity matching score is calculated; both the first entity matching score and the second entity matching score are determined to be the target entity matching score. If not, the large model is invoked to perform data entity reasoning on the target problem to obtain the second data entity set obtained through reasoning; the first vector similarity between each entity in the first data entity set and each entity in the second data entity set is calculated respectively; based on the first vector similarity, the first entity matching score is calculated. In addition, the system obtains the target user's preferred entity set; calculates the second vector similarity between each entity in the first data entity set and each entity in the preferred entity set; and calculates the second entity matching score based on the second vector similarity.
4. The method according to any one of claims 1-3, characterized in that, The step of selecting a target field from the plurality of candidate fields based on the semantic matching score and the preference matching score includes: Calculate the ambiguity score of the target question, which is used to characterize the degree of ambiguity of the target user's question requirement; Based on the fuzziness score, calculate the first weight corresponding to the semantic matching score and the second weight corresponding to the preference matching score; Based on the first weight and the second weight, a weighted average is calculated on the semantic matching score and preference matching score of each candidate field to obtain a comprehensive matching score; A preset number of target fields are selected from the multiple candidate fields according to the comprehensive matching score in descending order.
5. The method according to claim 4, characterized in that, The calculation of the ambiguity score of the target problem includes: The target indicators in the target problem are decomposed into orthogonal indicators, which are multiple indicators that do not have information overlap with each other; The information entropy of the target problem is calculated based on the number of orthogonal indices, and the ambiguity score of the target problem is obtained.
6. A smart card generation device based on data matching, characterized in that, include: The acquisition unit is configured to acquire a first vector in response to a target user's target question, wherein the first vector is used to characterize the semantic information of the target indicator in the target question; A filtering unit is used to filter out multiple candidate fields, wherein the candidate fields correspond to a second vector, and the similarity between the second vector and the first vector is greater than a first preset threshold. The calculation unit is used to calculate the semantic matching score and preference matching score of each candidate field. The semantic matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of objective semantics of the text, and the preference matching score is used to characterize the degree of matching between the candidate field and the user's needs at the level of subjective preferences of the user. A selection unit is used to select a target field from the plurality of candidate fields based on the semantic matching score and the preference matching score; A generation unit is used to generate smart cards, which are used to carry the associated data of the target field; The calculation of the semantic matching score and preference matching score for each candidate field includes: for each candidate field, performing the following operations sequentially: calculating a first topic matching score and a second topic matching score for the candidate field, wherein the first topic matching score is used to characterize the degree of matching between the research topic of the data table to which the candidate field belongs and the research topic pointed to by the text semantics of the target question, and the second topic matching score is used to characterize the degree of matching between the research topic of the data table to which the candidate field belongs and the preference topic of the target user; calculating a first frequency matching score and a second frequency matching score for the candidate field, wherein the first frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the data frequency pointed to by the text semantics of the target question, and the second frequency matching score is used to characterize the degree of matching between the data frequency of the candidate field in its data table and the preference topic ... The matching degree between the data frequency of the candidate field in its respective data table and the preference frequency of the target user is characterized; a first entity matching score and a second entity matching score are calculated for the candidate field, wherein the first entity matching score is used to characterize the matching degree between the data entity set to which the candidate field belongs and the data entity set pointed to by the text semantics of the target question, and the second entity matching score is used to characterize the matching degree between the data entity set to which the candidate field belongs and the preference entity set of the target user; a semantic matching score for the candidate field is calculated based on the first topic matching score, the first frequency matching score, and the first entity matching score; and a preference matching score for the candidate field is calculated based on the second topic matching score, the second frequency matching score, and the second entity matching score. The calculation of the first frequency matching score and the second frequency matching score of the candidate field includes: obtaining the first data frequency of the candidate field in its respective data table; determining whether there is a first keyword in the target question used to indicate the data frequency; if so, determining the data frequency indicated by the first keyword as the target data frequency; assigning values to the first data frequency and the target data frequency respectively, and calculating the target frequency matching score based on the difference between the assigned values; determining that both the first frequency matching score and the second frequency matching score are the target frequency matching scores; if not, determining the public habitual frequency corresponding to the candidate field as the second data frequency; assigning values to the first data frequency and the second data frequency respectively, and calculating the first frequency matching score based on the difference between the assigned values; and obtaining the preference frequency of the target user; assigning values to the first data frequency and the preference frequency respectively, and calculating the second frequency matching score based on the difference between the assigned values.
7. An electronic device, characterized in that, It includes a processor, a memory, and one or more programs, said one or more programs being stored in the memory and configured to be executed by the processor, said programs including instructions for performing the steps in the method as claimed in any one of claims 1-5.
8. A computer-readable storage medium having a computer program or instructions stored thereon, characterized in that, When the computer program or instructions are executed by a processor, they implement the steps of the method according to any one of claims 1-5.