Issue reporting method, computer program therefor, and computer-readable storage medium for storing computer program

The issue reporting method addresses the limitations of conventional news summary services by filtering, clustering, and generating summaries with quantitative insights, ensuring efficient and meaningful information delivery.

WO2026135145A1PCT designated stage Publication Date: 2026-06-25POSCO HLDG INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
POSCO HLDG INC
Filing Date
2025-12-16
Publication Date
2026-06-25

Smart Images

  • Figure KR2025021837_25062026_PF_FP_ABST
    Figure KR2025021837_25062026_PF_FP_ABST
Patent Text Reader

Abstract

This issue reporting method comprises: a data collection step of collecting and storing articles; a recommendation modeling step of using a recommendation model trained through user feedback-based labeling, so as to filter the collected articles; a deduplication step of calculating similarity between embedding vectors of the filtered articles so as to remove duplicate articles; a clustering step of performing clustering on the deduplicated articles so as to determine a plurality of clusters, and selecting a predetermined number of clusters from among the plurality of clusters; and a report generation step of using a generative artificial intelligence model through prompt engineering for each of the selected clusters, so as to generate issue reports.
Need to check novelty before this filing date? Find Prior Art

Description

Issue reporting method, computer program for this purpose, and computer-readable storage medium for storing the computer program

[0001] The present invention relates to an issue reporting method, a computer program for the same, and a computer-readable storage medium for storing the computer program.

[0002] With the recent surge in the volume of news articles, the importance of article summary services is increasing to allow users to efficiently obtain the information they want.

[0003] Conventional news summary services use a method of simply extracting keywords and generating summaries based on them.

[0004] However, this method has drawbacks when processing large amounts of news data, such as the duplication of similar content in summaries or the omission of important information. Additionally, it has limitations in generating summaries that include specific quantitative information or insights.

[0005] The present invention provides an issue reporting method that can filter articles by reflecting user preferences.

[0006] The present invention provides an issue reporting method that can effectively remove duplicate articles while reflecting importance.

[0007] The present invention provides an issue reporting method that efficiently classifies major issues through clustering.

[0008] The present invention provides an issue reporting method that generates an insightful summary containing specific quantitative information.

[0009] The technical problems to be solved in this document are not limited to those mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art to which this invention belongs from the description below.

[0010] An issue reporting method according to an embodiment of the present invention may include: a data collection step of collecting and storing articles; a recommendation modeling step of filtering the collected articles using a recommendation model learned through user feedback-based labeling; a duplicate removal step of removing duplicate articles by calculating similarity between embedding vectors of the filtered articles; a clustering step of determining a plurality of clusters by performing clustering on the articles from which duplicates have been removed and selecting a predetermined number of clusters among the plurality of clusters; and a report generation step of generating an issue report using a generative artificial intelligence model through prompt engineering for each selected cluster.

[0011] The above data collection step may include the step of directly storing the titles of the articles in the database and encrypting and storing the body text of the articles in the database.

[0012] The above recommendation model can be trained through user feedback-based labeling of a predetermined number of articles.

[0013] The recommendation modeling step may include a step of calculating recommendation scores of the collected articles using the recommendation model.

[0014] The clustering step described above includes a step of calculating a score for ranking a plurality of clusters; and the step of calculating the score may include a step of calculating a ranking score for each of the plurality of clusters based on recommendation scores of articles included in each of the plurality of clusters.

[0015] The above duplicate removal step may include: a step of converting the filtered articles into embedding vectors; and a step of determining articles in which the similarity between the converted embedding vectors exceeds a predetermined threshold as duplicate articles.

[0016] The above duplicate removal step may further include the step of removing at least some of the duplicate articles and assigning weights to the remaining parts.

[0017] The clustering step described above includes a step of calculating a score for ranking a plurality of clusters; and the step of calculating the score may include a step of calculating a ranking score for each of the plurality of clusters such that the ranking score of the corresponding cluster increases as the number of articles to which the weight is assigned within a specific cluster among the plurality of clusters increases.

[0018] The step of calculating the above score may include the step of calculating a ranking score for each of the plurality of clusters based on the recommendation scores and weights of the articles included in each of the plurality of clusters.

[0019] The clustering step may include a step of selecting a predetermined number of clusters in order of highest ranking score among a plurality of clusters by applying ranking that considers the recommendation score and the weight.

[0020] The clustering step may include a step of determining the plurality of clusters by calculating an optimal k value, which is a clustering parameter that determines the number of clusters using a cluster evaluation index, and performing clustering according to the calculated k value.

[0021] The above report generation step may include the step of generating summaries of the selected clusters and generating the issue report that includes all of the summaries.

[0022] The above prompt engineering may involve the application of the Chain of Thought (CoT) technique so that the generative artificial intelligence model generates the summaries containing quantitative information.

[0023] The step of generating summaries of the selected clusters may include the step of using the generative artificial intelligence model through the prompt engineering to identify the major events of each of the selected clusters in a first sequence, and to extract quantitative information included in the major events in a second sequence following the first sequence.

[0024] The above quantitative information may include at least one specific figure among the number of contracts, export volume, investment amount, market share, or growth rate.

[0025] A computer-readable storage medium according to one embodiment of the present invention can record a computer program for executing the above methods on a computer.

[0026] A computer program according to one embodiment of the present invention may perform the above methods when executed by one or more processors of a computer.

[0027] According to one embodiment of the present invention, by using a user feedback-based recommendation model, articles reflecting the user's preferences are filtered preferentially among numerous articles, thereby allowing for the selection of only articles meaningful to the user while reducing data throughput.

[0028] According to one embodiment of the present invention, duplicate articles are removed by calculating the cosine similarity between embedding vectors, and by assigning weights based on the number of duplicate articles, the importance of issues that received attention during the relevant period can be reflected.

[0029] According to one embodiment of the present invention, by generating an issue report that includes specific quantitative information such as the number of contracts, export volume, investment amount, market share, and growth rate, it is possible to provide an insightful issue report rather than a simple news summary.

[0030] According to one embodiment of the present invention, key information necessary for the user can be provided quickly and effectively from a vast amount of articles.

[0031] FIG. 1 is a flowchart for explaining an issue reporting method according to one embodiment.

[0032] FIG. 2 is a diagram illustrating a data collection step according to one embodiment.

[0033] FIG. 3 is a diagram illustrating a recommendation modeling step according to one embodiment.

[0034] FIG. 4 is a diagram illustrating a duplicate removal step according to one embodiment.

[0035] FIG. 5 is a diagram illustrating a clustering step according to one embodiment.

[0036] FIG. 6 is a diagram illustrating ranking in a clustering step according to one embodiment.

[0037] FIG. 7 is a diagram illustrating a report generation step according to one embodiment.

[0038] FIG. 8 illustrates an example of an issue report generated by an issue reporting method according to one embodiment.

[0039] The embodiments described in this document and the configurations illustrated in the drawings are merely preferred examples of the disclosed invention, and various modifications that may replace the embodiments and drawings of this specification may exist at the time of filing this application.

[0040] The terms used in this document are for describing the embodiments and are not intended to limit or restrict the disclosed invention.

[0041] For example, in this specification, singular expressions may include plural expressions unless the context clearly indicates otherwise.

[0042] In this document, each of the phrases such as "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "at least one of A, B, or C" may include any one of the items listed together in the corresponding phrase, or all possible combinations thereof.

[0043] The term "and / or" includes a combination of multiple related described components or any of the multiple related described components. For example, "A and / or B" may include only "A," only "B," or both "A and B."

[0044] Additionally, terms such as “include” or “have” are intended to express the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and do not exclude the additional existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

[0045] When it is said that a component is "connected," "combined," "supported," or "in contact" with another component, this includes not only cases where the components are directly connected, combined, supported, or in contact, but also cases where they are indirectly connected, combined, supported, or in contact through a third component.

[0046] When it is said that a component is located "on" another component, this includes not only cases where one component is in contact with the other, but also cases where another component exists between the two components.

[0047] Meanwhile, terms such as "front," "rear," "left," "right," "top," and "bottom" used in the following description are defined based on the drawings; however, the shape and position of each component are not limited by these terms. For example, the front side may be defined as the +X side and the rear side as the -X side. For example, based on the drawings, the right side may be defined as the +Y side and the left side as the -Y side. For example, based on the drawings, the top side may be defined as the +Z side and the bottom side as the -Z side.

[0048] In addition, terms including ordinal numbers, such as "first," "second," etc., are used to distinguish one component from another and do not limit the components.

[0049] In addition, terms such as "~part," "~unit," "~block," "~part," and "~module" may refer to a unit that processes at least one function or operation. For example, the terms may refer to at least one piece of hardware such as an FPGA (field-programmable gate array) or ASIC (application specific integrated circuit), at least one piece of software stored in memory, or at least one process processed by a processor.

[0050] Methods and functions according to one embodiment of the present invention may be realized in the form of hardware, software, or a combination thereof. When implemented in software, a program for performing the methods of the present invention may be stored on a computer-readable storage medium (or recording medium). The recording medium may include program instructions, data files, data structures, etc., either alone or in combination.

[0051] The program instructions stored on the above-mentioned recording medium may be those specifically designed and configured for the present invention, or those known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, and flash memory. Additionally, examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

[0052] The above-mentioned hardware device (e.g., computer) may be configured to operate as at least one software module to perform the operation of the present invention, and vice versa.

[0053] An embodiment of the disclosed invention is described in detail below with reference to the attached drawings. Identical reference numbers or symbols in the attached drawings may indicate parts or components that perform substantially the same function.

[0054] The operating principle and embodiments of the present invention will be described below with reference to the attached drawings.

[0055] In the present invention, 'issue reporting' may be referred to as 'issue briefing', 'news reporting', 'news briefing', 'article reporting', 'article briefing', etc.

[0056] FIG. 1 is a flowchart for explaining an issue reporting method according to one embodiment.

[0057] Referring to FIG. 1, an issue reporting method (100) according to one embodiment of the present invention may include a data collection step (110), a recommendation modeling step (120), a duplicate removal step (130), a clustering step (140), and a report generation step (150).

[0058] In the data collection step (110), a large number of articles can be collected and stored. The large number of articles may include multilingual articles from major domestic and international media outlets.

[0059] In the recommendation modeling step (120), collected articles can be filtered using a recommendation model trained through user feedback-based labeling. At this time, the recommendation model is a model that reflects the actual user's preferences, and can perform primary filtering by calculating a recommendation score for each article.

[0060] In the duplicate removal step (130), the filtered articles are converted into embedding vectors, and duplicate articles can be removed by calculating the similarity between them (e.g., cosine similarity). Instead of simply removing identical articles, articles that are semantically similar can be detected and processed. At this time, some of the duplicate articles may be removed, and weights may be assigned to the remaining ones, taking into account that the article may be an important issue as it has been covered by multiple media outlets.

[0061] In the clustering step (140), clustering is performed on articles from which duplicates have been removed to determine multiple clusters, and a predetermined number of clusters can be selected from among them. An optimal k value can be calculated using a cluster evaluation index, and clustering can be performed accordingly. When selecting clusters, a ranking that considers both the recommendation score of each article and the weight based on duplicates can be applied, so that clusters of high importance can be selected.

[0062] The k value is a clustering parameter that determines the number of clusters and can be used to determine the boundaries of the clusters. For example, the k value determines how many clusters the data will be divided into; if k is 3, the entire data can be classified into 3 groups.

[0063] In the clustering step (140), a silhouette analysis method may be used to find the optimal k value. In the report generation step (150), an issue report may be generated using a generative AI model through prompt engineering with a Chain of Thought (CoT) technique applied to each selected cluster. The generated summary may include quantitative information (specific figures such as the number of contracts, export volume, investment amount, market share, or growth rate), thereby generating a report that provides practical insights rather than a simple news summary.

[0064] Through this series of steps, the present invention can automatically generate issue reports that are substantially meaningful to the user from a vast amount of multilingual articles, and in particular, enable effective identification of the latest trends in the field of future materials.

[0065] The specific processing steps for each stage are explained in detail with reference to the drawings described below.

[0066] FIG. 2 is a diagram illustrating a data collection step according to one embodiment.

[0067] Referring to FIG. 2, in the data collection step (110), the computer (1) can collect a plurality of articles (10). The articles (10) may include multilingual articles published by major domestic and foreign media outlets. For example, articles written in Korean, English, and Chinese may be collected.

[0068] The computer (1) can separate and process each collected article (10). Specifically, the body of each article can be encrypted and stored in a database, and metadata such as the title and date of creation can be stored directly in the database without encryption.

[0069] The encryption of the text can be performed using various encryption algorithms. This encryption process protects copyrighted content while allowing it to be decrypted and utilized in subsequent processing steps if necessary.

[0070] On the other hand, metadata such as titles and publication dates may be information frequently used for searching, filtering, and sorting articles, so it can be stored in a database without encryption to enable quick access.

[0071] The articles used to implement the issue reporting method (100) according to one embodiment of the present invention may be a plurality of articles classified and stored by the same date by metadata.

[0072] When performing an issue reporting method (100) using multiple articles classified and stored as the same date by metadata, a daily issue report can be generated.

[0073] When performing an issue reporting method (100) using multiple articles classified and stored by dates within a predetermined period (e.g., one week, one month, one year) by metadata, an issue report corresponding to the predetermined period (e.g., weekly issue report, monthly issue report, annual issue report) can be generated.

[0074] This method of processing article body and metadata separately enables efficient system operation while maintaining data security. In particular, since rapid access to metadata is possible, processing speed can be improved in subsequent steps such as recommendation modeling or clustering.

[0075] Meanwhile, articles stored in the database may be automatically deleted or moved to a separate storage location after a specified period has elapsed. This can enable efficient management of the database.

[0076] FIG. 3 is a diagram illustrating a recommendation modeling step according to one embodiment.

[0077] Referring to FIG. 3, in the recommendation model (ML1) ring step (120), articles (10) collected in the data collection step (110) can be filtered using a recommendation model (ML1) that reflects user preferences. For example, the recommendation model (ML1) can be trained through user feedback-based labeling data.

[0078] Inputting the articles (10) collected in the data collection step (110) into the recommendation model (ML1) may include inputting various features, such as the title, at least part of the body, or metadata of each article collected on the same date, into the recommendation model (ML1).

[0079] To train a recommendation model (ML1), user feedback on a predetermined number of articles (e.g., about 10,000) may be collected. This feedback may include user evaluations of the relevance, importance, or usefulness of the articles. The collected feedback may be converted into labeled data and used to train the recommendation model (ML1).

[0080] The trained recommendation model (ML1) can calculate a recommendation score for an article when a new article is input. The recommendation score may have a value within a predetermined range (e.g., between 0 and 1), and a high score may indicate that the article is more relevant or important to the user.

[0081] The recommendation model (ML1) can use various features as input, such as the title, body text, or metadata of each article. These features can be converted into an appropriate form through natural language processing techniques and input into the recommendation model (ML1).

[0082] Based on the calculated recommendation scores, articles with scores lower than a predetermined threshold can be filtered out and removed. This allows subsequent processing steps to handle only articles that are practically meaningful to the user.

[0083] Articles with scores lower than a predetermined threshold are filtered out and removed, and filtered articles (11) can be obtained.

[0084] In one embodiment, the calculated recommendation score may also be used to evaluate the importance of the cluster in a subsequent clustering step (140). This may contribute to improving the quality of the issue report that is finally generated.

[0085] Meanwhile, the recommendation model (ML1) can be retrained periodically by incorporating new user feedback. This allows it to continuously reflect user preferences that change over time.

[0086] FIG. 4 is a diagram illustrating a duplicate removal step according to one embodiment.

[0087] Referring to FIG. 4, in the duplicate removal step (130), the articles (11) filtered through the recommendation modeling step (120) can be converted into embedding vectors (12). The embedding vector conversion can be performed using natural language processing technology and can map the text information of each article into a high-dimensional vector space.

[0088] When embedding vectors (12) are generated, similarity between them can be calculated to identify duplicate articles. For example, cosine similarity between embedding vectors (12) can be calculated, and if the calculated cosine similarity exceeds a predetermined threshold, the articles can be determined to be duplicates.

[0089] In this case, not only are identical articles deemed duplicates, but articles dealing with semantically similar content may also be identified as duplicates. For example, articles from different media outlets that describe the same event differently may also be treated as duplicates.

[0090] Articles identified as duplicates are not all removed; instead, some may be removed while weights are assigned to the remainder. This takes into account that articles covered simultaneously by multiple media outlets are likely to be significant issues at that particular time.

[0091] Through this process, articles (13) with duplicates removed can finally be obtained. The articles (13) with duplicates removed and the weight information assigned to them can be utilized in the subsequent clustering step (140).

[0092] Meanwhile, the embedding model used for embedding vector transformation can be selected from various natural language processing models and may be further trained to suit a specific domain as needed.

[0093] FIG. 5 is a diagram illustrating a clustering step according to one embodiment.

[0094] Referring to FIG. 5, in the clustering step (140), clustering can be performed on the duplicate articles (13) obtained in the duplicate removal step (130).

[0095] To this end, the clustering step (140) may include a step of mapping the embedding vectors of the articles (13) from which duplicates have been removed into a two-dimensional space through a dimensionality reduction technique.

[0096] Dimensionality reduction can be used to visualize high-dimensional embedding vectors on a two-dimensional plane while preserving their relationships. This allows articles covering similar content to be located close to each other in two-dimensional space.

[0097] To perform clustering, the optimal k value can first be calculated using the silhouette coefficient. For example, the k value can be selected from values ​​between 5 and 25, but is not limited thereto, and the k value can be determined by considering the distribution characteristics of the data as a parameter that determines the number of clusters.

[0098] When clustering is performed according to the calculated k value, a plurality of clusters (A, B, C, D, E) can be formed as shown in FIG. 5. A center value can be calculated for each cluster, and the similarity between this center value and each data within the cluster (e.g., cosine similarity) can be calculated.

[0099] Data within each cluster whose cosine similarity with the center value exceeds a predetermined threshold may be selected. Additionally, the top sentences with the highest cosine similarity for each cluster may be selected. This may be intended to identify the sentences that best express the core content of the corresponding cluster.

[0100] Among the formed clusters, a predetermined number (e.g., 4) of clusters with relatively high importance may be finally selected. At this time, the importance of each cluster may be evaluated by considering the number of data included in the cluster, the recommendation scores of the articles included in the cluster, and weights based on duplication.

[0101] To this end, the clustering step (140) may include a step of selecting only a predetermined number of clusters of relatively high importance among the formed clusters, and this step will be described in detail later with reference to FIG. 6.

[0102] For the convenience of explanation below, it is assumed that the predetermined number is 4, but it goes without saying that this predetermined number may change depending on the user's settings.

[0103] The selected top 4 clusters can be used to structure the main content of the issue report during the subsequent report generation stage. In particular, the sentences with the highest cosine similarity selected for each cluster can be used to summarize the main content of that cluster.

[0104] Meanwhile, clustering algorithms can be selected from various algorithms depending on the characteristics of the data, and hyperparameters can be adjusted as needed.

[0105] FIG. 6 is a diagram illustrating ranking in a clustering step according to one embodiment.

[0106] Referring to FIG. 6, the ranking process in the clustering step (140) according to an embodiment of the present invention can be seen.

[0107] The clustering step (140) may include a step of calculating a score for ranking multiple clusters.

[0108] In the clustering step (140), ranking scores for multiple clusters (A, B, C, D, E) can be calculated. The ranking scores can be calculated by comprehensively considering the recommendation scores, weights, and number of articles (number of data) of the articles included in each cluster.

[0109] For example, the step of calculating ranking scores for multiple clusters (A, B, C, D, E) may include the step of calculating ranking scores for each of the multiple clusters based on recommendation scores of articles included in each of the multiple clusters.

[0110] These recommendation scores may be assigned to each article in the recommendation modeling step (120).

[0111] As another example, the step of calculating ranking scores for multiple clusters (A, B, C, D, E) may include the step of calculating a ranking score for each of the multiple clusters based on the weights of the articles included in each of the multiple clusters. These weights may be assigned to each article in the duplicate removal step (130).

[0112] In one embodiment, the step of calculating the score may include the step of calculating the ranking score of each of the plurality of clusters such that the more articles with the weights assigned within a specific cluster among the plurality of clusters there are, the higher the ranking score of the corresponding cluster becomes.

[0113] As another example, the step of calculating ranking scores for multiple clusters (A, B, C, D, E) may include the step of calculating a ranking score for each of the multiple clusters based on the number of articles included in each of the multiple clusters.

[0114] As illustrated in FIG. 6, the ranking results can be presented in the form of a table. The table may include recommendation scores, weights, and the number of articles for each cluster.

[0115] Specifically, Cluster B, which ranked 1st, may have a recommendation score b1, a weight b2, and a number of articles b3. Cluster A, which ranked 2nd, may have a recommendation score a1, a weight a2, and a number of articles a3. Cluster D, which ranked 3rd, may have a recommendation score d1, a weight d2, and a number of articles d3. Cluster C, which ranked 4th, may have a recommendation score c1, a weight c2, and a number of articles c3, and Cluster E, which ranked 5th, may have a recommendation score e1, a weight e2, and a number of articles e3.

[0116] In this case, the ranking score of each cluster can be calculated as a combination of the average recommendation score, average weight, and total number of articles included in the cluster. For example, the ranking score can be calculated by assigning a predetermined weight to each of these three factors.

[0117] For example, the ranking score of each cluster can be determined based on the following [Equation 1].

[0118] [Formula 1]

[0119] Ranking score =

[0120] Here, K can mean the number of articles included in the cluster, and R n can mean the recommendation score of the nth article, and W n can mean the weight of the nth article.

[0121] However, the method of calculating the ranking scores of each cluster is not limited to this.

[0122] In the clustering step (140), a predetermined number of top clusters (e.g., 4) (e.g., B, A, D, and C) can be selected based on the ranking scores calculated as above, and the selected clusters can be utilized in the subsequent report generation step (150).

[0123] Meanwhile, the weights of each element for calculating the ranking score can be adjusted according to user settings or data characteristics.

[0124] FIG. 7 is a diagram illustrating a report generation step according to one embodiment.

[0125] Referring to FIG. 7, in the report generation step (150), an issue report can be generated using a generative artificial intelligence model for each selected cluster. At this time, prompt engineering can be performed by applying a specific persona, which allows the report to be generated with a consistent perspective and style.

[0126] Step-by-step prompts (PTs) can be input into generative artificial intelligence models (ML2). These prompts utilize the Chain of Thought (CoT) technique, enabling the generation of a summary containing specific information through a sequential thought process.

[0127] In the first step, articles from the selected cluster can be input into a generative artificial intelligence model (ML2). This may be a step that provides basic data for the generative artificial intelligence model (ML2) to process.

[0128] Inputting articles of selected clusters into a generative artificial intelligence model (ML2) may include inputting data into the generative artificial intelligence model (ML2) in which the cosine similarity with the centroid value calculated for each cluster exceeds a predetermined threshold.

[0129] In the second step, you can request the identification of key events within the input cluster. This allows the generative AI model (ML2) to identify key events or issues being addressed within that cluster.

[0130] In the third step, a request may be made to extract specific numerical information included in each identified event. This may be intended to generate a summary containing numerical information (or quantitative information), such as the number of contracts, investment amount, market share, growth rate, etc.

[0131] A prompt according to one embodiment of the present invention may be engineered to include a prompt requesting the extraction of specific numerical information included in each identified event, following a prompt requesting the identification of key events within an input cluster.

[0132] Through this stepwise prompt processing, generative AI models (ML2) can generate specific and insightful summaries. The generated summaries can include not only qualitative descriptions but also quantitative numerical information, thereby providing more practical information.

[0133] Meanwhile, the specific details of the prompt (PT) may be adjusted according to the operational purpose of the system or the characteristics of the data, and additional steps may be included as necessary.

[0134] When generating issues for each cluster, only articles deemed to have high importance can be selected and extracted as relevant articles, rather than including all articles within the cluster. In this case, importance can be determined by comprehensively considering the article's recommendation score, weight, and other factors.

[0135] The association between generated issues and related articles can be verified through cosine similarity. Specifically, the cosine similarity between the final generated issue and each related article can be calculated, and articles with a similarity below a predetermined threshold can be excluded from the list of related articles.

[0136] Subsequently, related articles can be sorted in order of highest cosine similarity to the issue. This allows the final issue report to sequentially reference the most relevant articles first.

[0137] FIG. 8 illustrates an example of an issue report generated by an issue reporting method according to one embodiment.

[0138] Referring to FIG. 8, the issue report (RT) may include multiple summaries (CR1, CR2) generated per cluster. The summaries (CR1, CR2) may include key contents extracted from each cluster.

[0139] For example, the summary (CR1) covers news regarding the financing for the expansion of a specific company (Company L)'s lithium production facility, and may include specific numerical information (g1) of $2.26 billion.

[0140] The summary (CR2) covers news of the discovery of lithium reserves in a specific country (Country S) and may include specific numerical information (g2) that the discovered reserves can meet more than nine times the global demand for lithium.

[0141] In this way, the issue report (RT) can quantitatively assess the scale or impact of the issue by providing the key details of each cluster along with specific numerical information (g1, g2).

[0142] Meanwhile, these specific figures can be extracted through prompt engineering applying the CoT technique described earlier and included in the summary.

[0143] In one embodiment, the issue reporting method may be provided with various categories distinguished according to various metadata.

[0144] For example, issue reporting methods may include methods for reporting issues regarding articles published domestically, methods for reporting issues regarding articles published overseas (e.g., the United States), and methods for reporting issues regarding articles published in specific groups of countries (e.g., all countries, Asia, Europe, etc.).

[0145] In one embodiment, the issue reporting method of the present invention may be a daily issue reporting method performed based on articles collected during the day.

[0146] In one embodiment, the issue reporting method of the present invention may be an issue reporting method corresponding to a predetermined period (e.g., weekly issue reporting method, monthly issue reporting method, annual issue reporting method) performed based on articles collected during a predetermined period.

[0147] According to various embodiments, the present invention may further include the step of generating an issue report corresponding to a predetermined period (e.g., a weekly issue report or a monthly issue report) by performing steps 110 to 150 based on daily issue reports (RTs) generated daily during a predetermined period (e.g., one week or one month).

[0148] According to various embodiments, the present invention may further include the step of generating a monthly issue report by performing steps 110 to 150 based on a weekly issue report (RT) generated every week for a month.

[0149] According to various embodiments, the present invention may further include the step of generating a monthly issue report by performing steps 110 to 150 based on a monthly issue report (RT) generated every month for a year.

[0150] Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium that stores instructions executable by a computer. The instructions may be stored in the form of program code and, when executed by a processor, may generate a program module to perform the operation of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

[0151] Computer-readable recording media include all types of recording media that store instructions that can be decoded by a computer. Examples include ROM (read-only memory), RAM (random access memory), magnetic tape, magnetic disk, flash memory, optical data storage devices, etc.

[0152] Additionally, computer-readable recording media may be provided in the form of non-transitory storage media. Here, 'non-transitory storage media' simply means that it is a tangible device and does not contain a signal (e.g., electromagnetic waves), and this term does not distinguish between cases where data is stored semi-permanently and cases where it is stored temporarily. For example, 'non-transitory storage media' may include a buffer in which data is stored temporarily.

[0153] According to one embodiment, the method according to the various embodiments disclosed herein may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed in the form of a device-readable recording medium (e.g., compact disc read-only memory (CD-ROM)), or distributed online (e.g., download or upload) through an application store (e.g., Play Store™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product (e.g., downloadable app) may be temporarily stored or temporarily created on a device-readable recording medium, such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0154] As described above, the disclosed embodiments have been explained with reference to the attached drawings. Those skilled in the art will understand that the present invention may be practiced in forms different from the disclosed embodiments without changing the technical spirit or essential features of the invention. The disclosed embodiments are illustrative and should not be interpreted restrictively.

Claims

1. Regarding issue reporting methods, Data collection stage for collecting and storing articles; A recommendation modeling step for filtering the collected articles using a recommendation model trained through user feedback-based labeling; A duplicate removal step that removes duplicate articles by calculating the similarity between the embedding vectors of filtered articles; A clustering step of determining multiple clusters by performing clustering on articles from which duplicates have been removed, and selecting a predetermined number of clusters from among the multiple clusters; and An issue reporting method comprising: a report generation step of generating an issue report using a generative artificial intelligence model through prompt engineering for each of the selected clusters.

2. In Paragraph 1, The above data collection step is, An issue reporting method comprising the step of directly storing the titles of the above articles in a database and encrypting and storing the body of the above articles in the database.

3. In Paragraph 1, The recommended model above is, An issue reporting method learned through user feedback-based labeling of a predetermined number of articles.

4. In Paragraph 1, The above recommended modeling step is, An issue reporting method comprising the step of calculating a recommendation score of the collected articles using the recommendation model above.

5. In Paragraph 4, The above clustering step is, A step of calculating a score for ranking multiple clusters; including, The step of calculating the above score is, An issue reporting method comprising the step of calculating a ranking score for each of the plurality of clusters based on the recommendation scores of articles included in each of the plurality of clusters.

6. In Paragraph 1, The above duplicate removal step is, A step of converting the above filtered articles into embedding vectors; and An issue reporting method comprising the step of determining articles in which the similarity between the transformed embedding vectors exceeds a predetermined threshold as duplicate articles.

7. In Paragraph 6, The above duplicate removal step is, An issue reporting method further comprising the step of removing at least some of the duplicate articles and assigning weights to the remaining parts.

8. In Paragraph 7, The above clustering step is, A step of calculating a score for ranking multiple clusters; including, The step of calculating the above score is, An issue reporting method comprising the step of calculating a ranking score for each of the plurality of clusters such that the more articles with the weights assigned within a specific cluster among the plurality of clusters there are, the higher the ranking score of the cluster.

9. In Paragraph 1, The above recommended modeling step is, The method includes the step of calculating a recommendation score for the collected articles using the recommendation model; The above duplicate removal step is, The method includes the step of removing at least some of the duplicate articles and assigning weights to the remaining parts; The above clustering step is, A step of calculating a score for ranking multiple clusters; including, The step of calculating the above score is, An issue reporting method comprising the step of calculating a ranking score for each of the plurality of clusters based on the recommendation score and weight of articles included in each of the plurality of clusters.

10. In Paragraph 9, The above clustering step is, An issue reporting method comprising the step of selecting a predetermined number of clusters in order of highest ranking score among a plurality of clusters by applying a ranking that considers the recommendation score and the weight.

11. In Paragraph 1, The above clustering step is, An issue reporting method comprising the step of determining the plurality of clusters by calculating an optimal k value, which is a clustering parameter that determines the number of clusters using a cluster evaluation metric, and performing clustering according to the calculated k value.

12. In Paragraph 1, The above report generation step is, An issue reporting method comprising the step of generating summaries of the selected clusters and generating an issue report that includes all of the summaries.

13. In Paragraph 12, The above prompt engineering is, An issue reporting method in which the Chain of Thought (CoT) technique is applied so that the above-mentioned generative artificial intelligence model generates the above-mentioned summaries containing quantitative information.

14. In Paragraph 12, The step of generating summaries of the selected clusters above is: An issue reporting method comprising the step of using the generative artificial intelligence model through the prompt engineering to identify major events of each of the selected clusters in a first sequence, and extracting quantitative information included in the major events in a second sequence following the first sequence.

15. In Paragraph 13, The above quantitative information is, An issue reporting method that includes at least one specific figure among the number of contracts, export volume, investment amount, market share, or growth rate.