Apparatus and method for recommending news articles
The news article recommendation device addresses the issue of duplicate articles and unreliable recommendations by filtering and scoring news articles, resulting in a reliable newsletter with optimized content.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- POSCO HLDG INC
- Filing Date
- 2025-12-18
- Publication Date
- 2026-06-25
Smart Images

Figure KR2025022148_25062026_PF_FP_ABST
Abstract
Description
News article recommendation device and method
[0001] The present disclosure relates to a technology for recommending news articles.
[0002] Artificial intelligence is a field that performs repetitive learning in a manner similar to human intelligence and makes judgments based on the results of that learning. Artificial intelligence is a broad concept that includes machine learning and deep learning, and machine learning is used as a broad concept that includes deep learning.
[0003] Machine learning is a field of artificial intelligence (AI) that develops algorithms and technologies enabling computers to learn from data. It serves as a core technology in various fields such as image processing, video recognition, speech recognition, and internet search, demonstrating outstanding performance in prediction and detection.
[0004] In particular, the LLM model, a type of artificial intelligence model, is trained using prompts—which are a vast amount of text data input to the model. LLM models can generate consistent responses to various prompts. Furthermore, LLM models can translate languages, generate text that meets specific conditions, and summarize text.
[0005] Conventional news recommendation systems suffer from the problem of frequent duplicate articles and the recommendation of news articles that do not align with pre-set categories. Furthermore, the lack of clear criteria for determining which articles to recommend leads to a decline in the reliability of the suggested news.
[0006] The present disclosure aims to provide a technology that filters collected news articles based on information produced through a pre-set artificial intelligence model and generates a newsletter containing a summary of the recommended news articles.
[0007] In one aspect, the present embodiments provide a news article recommendation device comprising: an article data acquisition unit that acquires news article information regarding a plurality of news articles collected for a predetermined field through a predetermined database; an article data refinement unit that calculates a similarity between each news article and a recommendation score for each news article based on the news article information, deletes duplicate articles included in the plurality of news articles by comparing the similarity with a predetermined threshold, and filters the plurality of news articles based on the recommendation score; and a newsletter output unit that outputs a newsletter containing the plurality of news articles that remain refined based on the similarity and recommendation score.
[0008] In another aspect, the present embodiments provide a news article recommendation method comprising the steps of: obtaining news article information regarding a plurality of news articles collected for a predetermined field through a predetermined database; calculating a similarity between each news article and a recommendation score for each news article based on the news article information; deleting duplicate articles included in the plurality of news articles by comparing the similarity with a predetermined threshold; filtering the plurality of news articles based on the recommendation score; and outputting a newsletter containing the plurality of news articles that have been refined and are remaining based on the similarity and recommendation score.
[0009] The present disclosure may provide a technology for recommending news articles.
[0010] FIG. 1 is a drawing for explaining the configuration of a device for recommending news articles according to one embodiment.
[0011] FIG. 2 is a flowchart for schematically explaining the process of generating a newsletter based on a news article according to one embodiment.
[0012] FIG. 3 is a flowchart for explaining the process of recommending news articles using news article titles according to one embodiment.
[0013] FIG. 4 is a flowchart illustrating the process of generating summary information of a news article using the body of a news article according to one embodiment.
[0014] FIG. 5 is a flowchart for explaining the process of recommending news articles according to one embodiment.
[0015] FIG. 6 is a configuration diagram of a computing device including an artificial intelligence model according to one embodiment.
[0016] FIG. 7 is a configuration diagram of a computer system including a client-server that includes an artificial intelligence model according to one embodiment.
[0017] Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the exemplary drawings. In assigning reference numerals to the components of each drawing, the same components may have the same reference numeral as much as possible, even if they are shown in different drawings. Furthermore, in describing the embodiments, if it is determined that a detailed description of related known components or functions may obscure the essence of the technical concept, such detailed description may be omitted. Where terms such as "comprising," "having," or "consisting of" are used in this specification, other parts may be added unless "only" is used. Where a component is expressed in the singular, it may include a plural unless otherwise specified.
[0018] Additionally, terms such as first, second, A, B, (a), (b), etc., may be used to describe the components of the present disclosure. These terms are used merely to distinguish the components from other components, and the nature, order, sequence, or number of the components are not limited by such terms.
[0019] In describing the positional relationship of components, where it is stated that two or more components are "connected," "combined," or "joined," it should be understood that while the two or more components may be directly "connected," "combined," or "joined," they may also be "connected," "combined," or "joined" with other components "intervened." Here, the other components may be included in one or more of the two or more components that are "connected," "combined," or "joined" with one another.
[0020] In describing the temporal flow relationship regarding components, methods of operation, or methods of production, for example, when the temporal or sequential relationship is described using "after," "following," "next," or "before," it may include cases where the relationship is not continuous unless "immediately" or "directly" is used.
[0021] Meanwhile, where numerical values or corresponding information regarding a component (e.g., levels, etc.) are mentioned, even without separate explicit notation, the numerical values or corresponding information may be interpreted as including a range of error that may occur due to various factors (e.g., process factors, internal or external shocks, noise, etc.).
[0022] The embodiments are described in detail below with reference to the drawings.
[0023]
[0024] FIG. 1 is a drawing for explaining the configuration of a device for recommending news articles according to one embodiment.
[0025] Referring to FIG. 1, the news article recommendation device (100) of the present disclosure includes an article data acquisition unit (110) that acquires news article information for a plurality of news articles collected for a preset field through a preset database.
[0026] The news article recommendation device (100) of the present disclosure can perform filtering through a recommendation score calculated based on the titles of a plurality of news articles collected for a preset field, and generate a summary of the article body to provide optimized newsletter content preferred by the user.
[0027] For example, the aforementioned news article information includes at least one of the news article title information, body information, media company name, recommendation frequency, and publication date, and can be collected from at least one website through a crawling technique based on a preset field or a preset period and stored in a preset database.
[0028] The aforementioned pre-set fields can be configured in various ways as needed, regardless of type or scope. For example, secondary batteries can be configured as a field, or autonomous vehicles can be configured as a single field. In addition, the aforementioned fields do not need to be configured as only one, and two or more fields may be configured.
[0029] The aforementioned preset period may refer to a period during which news articles are periodically updated, and may be set in units of days, weeks, or months. However, the aforementioned period is not fixed as a single period and may be set in various ways as needed.
[0030] The news article recommendation device (100) of the present disclosure can collect at least one news article from a website through crawling. Crawling refers to visiting various web pages on the internet using a web crawler, which is software, to collect information contained within the web pages. The news article recommendation device (100) of the present disclosure sets a URL (Uniform Resource Locator) that serves as a starting point for crawling, and if data exists in a pre-set field or for a pre-set period at the URL through a pre-set library, it can obtain the title or body content of the news article as text, and extract the content contained in the text and store it in a database.
[0031] The data collected through crawling may include information on the title, body, and media outlet name of news articles, the publication date of news articles, and the frequency of user recommendations or non-recommendations; this is just one example, and various data may be collected as needed.
[0032] The aforementioned pre-configured libraries may include HTTP libraries, but are not limited thereto, and various libraries may be used as needed.
[0033] The news article recommendation device (100) of the present disclosure can store the title and body of the news article obtained by the crawling method described above in a separate database, and can encrypt at least one of the title and body and store it in the database described above.
[0034] The news article recommendation device (100) of the present disclosure can perform preprocessing of collected data to calculate a recommendation score of a news article or to generate a summary of the body of a news article through an artificial intelligence model.
[0035] When news article information of multiple news articles is collected through crawling, the news article recommendation device (100) of the present disclosure separates the title information and body information of the news articles and can translate the separated title information and body information into at least one language.
[0036] For example, the news article recommendation device (100) of the present disclosure can be configured to translate title information into English, and to translate Chinese and Japanese text into English, while leaving English and Korean text as they are. This has the effect of improving the performance of the model by inputting text translated into a specific language into the artificial intelligence model.
[0037] In addition, the news article recommendation device (100) of the present disclosure can perform a task to remove unnecessary data when such data is combined with each news article information as a preprocessing process of the collected data.
[0038] For example, if the title of a news article is assumed to be 'Secondary Battery Market Outlook - Mirae Ilbo', the news article recommendation device (100) of the present disclosure can delete the '-', classify 'Secondary Battery Market Outlook' as news article title information, classify 'Mirae Ilbo' as 'media company name', and store each in a pre-set database.
[0039] For example, if the artificial intelligence model is a Large Language Model (LM), which is one of the language models, the news article recommendation device (100) of the present disclosure may perform tokenization to classify the text of collected newspaper articles into tokens, which are the smallest meaningful units, or perform normalization to convert the text into a standard form (conversion to lowercase, conversion to numbers, removal of special characters), remove meaningless words, or convert the text into a vector containing numerical values. The aforementioned preprocessing method is merely an example and can be configured in various ways as needed.
[0040] The news article recommendation device (100) of the present disclosure includes an article data refinement unit (120) that calculates a similarity between each news article and a recommendation score for each news article based on news article information, deletes duplicate articles included in a plurality of news articles by comparing the similarity with a preset threshold, and filters a plurality of news articles based on the recommendation score.
[0041] The news article recommendation device (100) of the present disclosure can calculate the similarity between each news article to remove articles with duplicate content among a plurality of news articles, and can calculate a recommendation score through a preset artificial intelligence model for filtering the collected news articles.
[0042] For example, the article data refinement unit (120) of the present disclosure can convert the title information of each of a plurality of news articles into a vector based on an embedding technique, and calculate the similarity between two news articles based on a vector and a cosine similarity judgment technique.
[0043] Specifically, the article data refinement unit (120) of the present disclosure can convert at least one word included in the title information of each of a plurality of news articles into a vector based on an embedding technique, and calculate the similarity between two news articles based on a vector and a cosine similarity judgment technique.
[0044] The present disclosure converts title information of collected news articles into a high-dimensional vector containing numbers using an embedding technique to ensure that duplicate articles are not included in recommended news provided through a newsletter, and can calculate the similarity between two news articles based on the converted vector and a cosine similarity judgment technique.
[0045] The embedding technique used for similarity calculation is a method that converts each data point into a high-dimensional vector containing numbers while preserving the meaning of the original data. For example, the word 'secondary battery' can be converted into a high-dimensional vector such as [0.52, -0.04, 0.16, ... 0.27] through the embedding technique, and the word 'car' can be converted into a high-dimensional vector such as [0.42, -0.17, 0.18, ... 0.25] through the embedding technique. Furthermore, the embedding technique can convert not only a single word into a vector but also an entire sentence or text into a single vector. For example, the title of a specific news article can be converted into a high-dimensional vector containing numbers such as [0.2, -0.01, 0.36, ... 0.87].
[0046] The article data refinement unit (120) of the present disclosure can calculate the similarity between two news articles by comparing the converted vectors based on an embedding technique.
[0047] The present disclosure proposes using a cosine similarity judgment technique for the aforementioned similarity calculation. The cosine similarity judgment technique is a technique that measures the degree of similarity between two data using a cosine function by comparing the directionality between two vectors. Similarity can be calculated by Equation 1, where θ is an angle, X and Y are the two vectors to be compared, XY is the dot product of the two vectors, and |X| and |Y| are the magnitudes of each vector.
[0048]
[0049] For example, if X is a vector [2, 1] and Y is a vector [1, 2], then XY is 2*1 + 1*2 = 4, and |X| and |Y| are respectively by ...is obtained. Therefore, the similarity Cosθ can be calculated as 0.8.
[0050] Since the similarity value Cosθ only has values between -1 and 1, the similarity also only has values between -1 and 1, and the closer Cosθ is to 1, the higher the similarity between the titles of the two news articles can be judged.
[0051] As another example, the article data refinement unit (120) of the present disclosure may delete one of the two news articles when there are two news articles in which the similarity calculated through the cosine similarity judgment technique is greater than or equal to a preset threshold.
[0052] The aforementioned preset threshold is a real number and can be set in various ways as needed.
[0053] As another example, the article data refinement unit (120) of the present disclosure may calculate a recommendation score for each news article based on news article information for a plurality of news articles. The aforementioned recommendation score is calculated through a preset artificial intelligence model including a Large Language Model (LM) that includes at least one transformer, and the data input to the aforementioned LLM may be characterized as being at least one of title information and body information translated into a preset language.
[0054] LLM can be trained using at least one of a Pretraining training method that learns patterns and structures of text using a prepared large-scale training dataset and a Fine-Tuning training method that performs training according to the intended use of LLM by labeling a portion of the large-scale training dataset, and can also be performed using a few-shot training method that performs training by reflecting examples in the training data as needed.
[0055] The news article recommendation device (100) of the present disclosure can obtain text regarding title information of a news article through one of the aforementioned preprocessing methods and output a recommendation score for each news article through the aforementioned text via LLM, which is a preset artificial intelligence model.
[0056] As another example, the article data refinement unit (120) of the present disclosure may input title information into a preset artificial intelligence model, and together input at least one piece of information among the media company name, publication date, and recommendation frequency corresponding to the title information into the preset artificial intelligence model to calculate a recommendation score. The recommendation score may be calculated by assigning a weight according to a preset standard to at least one piece of information among the media company name, publication date, and recommendation frequency that is input together with the title information.
[0057] For example, you can assign a higher weight to news articles with a high recommendation frequency to ensure a high recommendation score is calculated. Alternatively, you can assign a high weight to the name of a specific media outlet to achieve a high recommendation score. Or, you can set it so that the more recent the publication date, the higher the weight is assigned to achieve a high recommendation score. Furthermore, you can configure the recommendation score to be calculated by setting two or more criteria for weighting instead of just one.
[0058] The weight-based recommendation score calculation method is just one example, and it can be calculated in various ways as needed.
[0059] The aforementioned recommendation score can be calculated as a real number greater than or equal to 0, and a higher recommendation score indicates a higher priority for the recommended news included in the newsletter. For example, if the weight is set to be calculated as a real number between 0 and 1, the closer it is to 1, the higher the priority of the recommended news can be set.
[0060] As another example, the news article recommendation device (100) of the present disclosure may perform first filtering based on the media company name on a plurality of news articles from which duplicate articles have been removed, and perform second filtering by assigning priority in order of high recommendation score.
[0061] The news article recommendation device (100) of the present disclosure may pre-set a predetermined number of media outlets among media outlets preferred by the user or media outlets that publish many news articles, and may remove news articles that are not published by the aforementioned pre-set media outlets from among a plurality of news articles from which duplicate articles have been removed.
[0062] In addition, the news article recommendation device (100) of the present disclosure can perform secondary filtering to determine a predetermined number of news articles by assigning priority in order of highest recommendation score based on the recommendation score output through the artificial intelligence model described above.
[0063] The news article recommendation device (100) of the present disclosure includes a newsletter output unit (130) that outputs a newsletter containing a plurality of refined and remaining news articles based on similarity and recommendation scores.
[0064] For example, the newsletter output unit (130) inputs body information corresponding to the title information of multiple news articles that have been refined based on similarity and recommendation scores into a preset artificial intelligence model to output summary information, and the newsletter may include the title information of multiple news articles that have been refined based on similarity and recommendation scores and summary information.
[0065] The news article recommendation device (100) of the present disclosure can determine news articles included in a newsletter through a pre-set media company and a recommendation score. The newsletter may include title information and body information of the determined news article, or it may include the title information and summary information by inputting body information into the aforementioned pre-set artificial intelligence model to receive output summary information.
[0066] The text input to the artificial intelligence model for outputting summary information may be text translated into a pre-set language. For example, among the collected text information as described above, Chinese and Japanese are set to be translated into English, while English and Korean are left as they are, and text translated into English or Korean can be input into the artificial intelligence model to generate summary information.
[0067] The generation of summary information through a pre-configured artificial intelligence model may be performed after determining the news articles included in the newsletter based on pre-configured media outlets and recommendation scores, or it may be performed in advance on all news articles collected through crawling prior to the determination of the news articles.
[0068]
[0069] The present disclosure has the advantage of preventing duplicate recommendations of the same article when providing recommended news to a user, and of providing highly reliable news articles in a specific field by recommending news articles based on a recommendation score calculated based on weights assigned to specific conditions.
[0070]
[0071] Below, the overall process of recommending news articles is explained in more detail with reference to a diagram.
[0072]
[0073] FIG. 2 is a flowchart for schematically explaining the process of recommending news articles according to one embodiment.
[0074] Referring to FIG. 2, the news article recommendation device of the present disclosure can acquire article data and generate a newsletter containing recommended articles that have been filtered.
[0075] Specifically, the news article recommendation device of the present disclosure can obtain article data for a plurality of news articles from a preset database and, if necessary, can perform preprocessing of the article data (S200).
[0076] Article data for multiple news articles stored in a pre-configured database can be collected through a web crawling method for pre-configured fields.
[0077] Collected article data can be stored in a pre-configured database, and the title and body can be stored separately; additionally, acquired information other than the title and body can also be stored in the database. The separation of the title and body can be performed during the database storage process or during the preprocessing process.
[0078] The present disclosure may refer to information about news articles collected through crawling as article data or news article information.
[0079] The news article recommendation device of the present disclosure can obtain news article information regarding a plurality of news articles in a pre-set field through a pre-set database. The obtained article information may include at least one of the following: the article title, body information, the name of the media outlet where the article was published, the publication date, the recommendation frequency, and the non-recommendation frequency. Once the article data is obtained, the news article recommendation device of the present disclosure can perform preprocessing on the news article information.
[0080] For example, the news article recommendation device of the present disclosure can perform preprocessing to separate the title and body of news article information.
[0081] As another example, the news article recommendation device of the present disclosure can perform preprocessing to remove unnecessary items included in the news article information.
[0082] As another example, the news article recommendation device of the present disclosure can transform the content of news article information into a state suitable for inputting into an artificial intelligence model.
[0083] Since news article information is in text form, preprocessing can be performed in at least one of the following methods: tokenization, normalization, removal of meaningless words and special characters, and conversion of text into a vector containing numbers.
[0084] In addition, the news article recommendation device of the present disclosure may translate news article information into various languages as needed. The aforementioned preprocessing method is merely an example and can be configured in various ways as necessary.
[0085] When news article information is acquired and preprocessed, the news article recommendation device of the present disclosure can filter news articles by inputting news article title information into an artificial intelligence model to generate a recommendation score, or input body information into an artificial intelligence model to generate summary information of the news article body.
[0086] The generation of recommendation scores based on the aforementioned title information and the generation of summary information based on body information can be performed simultaneously or in parallel.
[0087] Specifically, when news article information for multiple news articles is acquired and preprocessed, the news article recommendation device of the present disclosure calculates a recommendation score for each news article through a preset artificial intelligence model (S210).
[0088] The news article recommendation device of the present disclosure obtains news article information necessary for calculating a recommendation score for a pre-set artificial intelligence model through a pre-set database, inputs all or part of the news article information into the pre-set artificial intelligence model, and outputs a recommendation score through the aforementioned artificial intelligence model.
[0089] The present disclosure proposes a Large Language Model (LM), one of the language models, as a pre-configured artificial intelligence model, given that article data consists of text.
[0090] LLM can be trained through at least one of a Pretraining training method that learns text patterns and structures using a prepared large-scale training dataset and a Fine-Tuning training method that performs training tailored to the intended use of LLM by labeling a portion of the large-scale training dataset.
[0091] For example, a recommendation score can be calculated based on weights assigned to each piece of input news article information. For example, the news article information may include at least one piece of information among the media outlet name, publication date, recommendation frequency, and non-recommendation frequency as described above, and a recommendation score for each news article can be calculated by assigning weights to the input information.
[0092] As another example, the recommendation score may be calculated by assigning a higher weight to news articles that have a large number of duplicate news articles determined based on the similarity described later. Therefore, the calculation of the recommendation score may be performed before the similarity calculation described later, but depending on the method of calculating the recommendation score, the similarity calculation may be performed first.
[0093] When a recommendation score is generated for each news article, the news article recommendation device of the present disclosure removes duplicate news articles based on the similarity between each news article (S220).
[0094] As described above, the news article recommendation device of the present disclosure can convert the title of a news article into a high-dimensional vector containing numbers based on an embedding technique, and calculate the similarity between each news article by calculating the cosine value between two vectors based on a cosine similarity judgment technique.
[0095] Similarity means a value between -1 and 1, and the closer it is to 1, the higher the similarity between news articles.
[0096] The news article recommendation device of the present disclosure may determine two news articles as duplicate articles and delete one of the two news articles if there are news articles having a similarity level greater than or equal to a preset threshold.
[0097] When duplicate articles are removed, the news article recommendation device of the present disclosure performs news article filtering through recommendation scores and media company priority (S230).
[0098] For example, the news article recommendation device of the present disclosure may arrange news articles with duplicate articles removed in order of recommendation score, remove news articles published by media outlets other than those set in advance, and determine a set number of news articles.
[0099] As another example, the news article recommendation device of the present disclosure may remove news articles published by media outlets other than those pre-set for news articles from which duplicate articles have been removed, assign priority in order of highest recommendation score, and determine a pre-set number of news articles in order of highest priority.
[0100] As another example, priorities can also be assigned to each of the aforementioned pre-set media outlets. For instance, if the pre-set media outlets are Newspaper A and Newspaper B, and a news article published in Newspaper A is set to have a higher priority than a news article published in Newspaper B, then in the event that there are news articles with the same recommendation score, the news article published in Newspaper A may be given a higher priority.
[0101] The news article recommendation device of the present disclosure can translate the text information of a news article into a pre-set language separately from the news article filtering process described above (S240).
[0102] The translation of text information is intended to improve summarization performance when using LLM, one of the language models.
[0103] For example, translation of the body of a news article can be performed through a Neural Machine Translation (NMT) model that includes an encoder for input text, a hidden state where translation is performed through computation, and a decoder that outputs the translated text.
[0104] However, the aforementioned translation model is merely an example, and translation may be performed using various models or algorithms as needed. Furthermore, the aforementioned translation can be applied equally to the translation of titles as well as to the translation of the main text.
[0105] When the text is translated into a pre-set language, the news article recommendation device of the present disclosure generates summary information of the text through the pre-set artificial intelligence model described above (S250).
[0106] The present disclosure proposes using an LLM model capable of accurately generating summary information through learning.
[0107] For example, a summary of the main content information can be generated by extracting keywords from the main content information and creating summary information based on the extracted keywords.
[0108] As another example, the summary of text information can be generated by extracting important sentences from each sentence included in the text and creating summary information based on the extracted important sentences. Alternatively, the similarity between each sentence included in the text can be calculated to determine sentences with a high similarity as important sentences, and summary information can be generated based on these determined important sentences.
[0109] The similarity between each of the aforementioned sentences is different from the similarity used for removing duplicate articles; in this disclosure, the similarity used for removing duplicate articles may be referred to as the first similarity, and the similarity used for generating summary information may be referred to as the second similarity. Both the aforementioned first similarity and second similarity may utilize embedding techniques and cosine similarity calculation techniques.
[0110] As another example, the Abstract Summarization method can be used, which understands the meaning of text information input through LLM and generates summary information based on the understanding results.
[0111] The news article recommendation device of the present disclosure generates a newsletter containing the title and summary information of a news article determined to be included in the newsletter when summary information is generated (S260).
[0112] Once the newsletter is generated, it can be sent to the user's terminal, a specific server, email, etc.
[0113]
[0114] FIG. 3 is a flowchart for explaining the process of recommending news articles using news article titles according to one embodiment.
[0115] Referring to FIG. 3, the news article recommendation device of the present disclosure can determine recommended news to be included in a newsletter through a recommendation score calculated based on title information among collected article data.
[0116] Specifically, the news article recommendation device of the present disclosure can obtain news article information for a pre-set field among news articles disclosed on a website and can perform preprocessing on the collected data (S300).
[0117] The news article recommendation device of the present disclosure can obtain news article information regarding a plurality of news articles collected for a preset field through a preset database.
[0118] The news article information stored in the aforementioned database includes at least one of the news article title information, body information, media company name, recommendation frequency, and publication date, and can be collected through a crawling technique from at least one website based on a preset field or a preset period.
[0119] The news article recommendation device of the present disclosure can perform preprocessing based on the acquired information when it acquires collected news article information.
[0120] For example, the title information and body information of a news article can be separated, and the separated title information can be translated into at least one language. For example, the news article recommendation device of the present disclosure can be configured to translate the title information into English, and to translate the body information in Chinese and Japanese into English, while leaving English and Korean as they are.
[0121] The translation of the aforementioned title information may be performed during the preprocessing stage or through a separate translation model after preprocessing. Additionally, the preprocessing method may include at least one of the processes of tokenization, normalization, removal of meaningless words and special characters, and conversion of text into a vector containing numerical values; however, this is merely an example and can be configured in various ways as needed.
[0122] When preprocessing is performed, the news article recommendation device of the present disclosure can input the training data into a preset artificial intelligence model to perform training (S310).
[0123] Since news articles are mostly in text form, the aforementioned pre-configured artificial intelligence model may include an LLM containing at least one transformer.
[0124] LLM can be trained through at least one of a Pretraining training method that learns text patterns and structures using a prepared large-scale training dataset and a Fine-Tuning training method that performs training tailored to the intended use of LLM by labeling a portion of the large-scale training dataset.
[0125] In addition, the news article recommendation device of the present disclosure may perform prompt engineering, which is a process of optimizing prompts that are input data of LLM to obtain desired results through LLM, and may perform few-shot prompting, which is one of the learning methods of LLM, for this purpose. Few-shot prompting refers to a technique that performs LLM learning with a small amount of training data, wherein examples are reflected in the training data.
[0126] When learning is complete, the news article recommendation device of the present disclosure inputs the title information of the collected news articles into a preset artificial intelligence model to calculate a recommendation score and can calculate the similarity between each news article (S320).
[0127] The news article recommendation device of the present disclosure may input title information into a preset artificial intelligence model to calculate a recommendation score. Alternatively, in addition to the aforementioned title information, at least one piece of information among the media company name, publication date, and recommendation frequency corresponding to the title information may be input into the preset artificial intelligence model to calculate a recommendation score. Alternatively, the recommendation score may be calculated by assigning a weight to at least one piece of information among the aforementioned media company name, publication date, and recommendation frequency.
[0128] The news article recommendation device of the present disclosure can convert at least one word included in the title information of each of a plurality of news articles into a vector based on an embedding technique, and calculate the similarity between two news articles based on a vector and a cosine similarity judgment technique.
[0129] When similarity is calculated, the news article recommendation device of the present disclosure can remove duplicate articles from among the collected news articles based on the calculation result (S330).
[0130] The news article recommendation device of the present disclosure may delete one of two news articles when there are two news articles whose similarity calculated through a cosine similarity judgment technique is greater than or equal to a preset threshold.
[0131] When duplicate articles are removed, the news article recommendation device of the present disclosure can perform filtering of news articles based on the calculated recommendation score and the media company name (S340).
[0132] When a preset number of news articles is determined through filtering, a newsletter can be generated and output based on the news titles of the determined news articles (S350).
[0133] The news article recommendation device of the present disclosure can determine news articles included in a newsletter through a preset media company and a recommendation score. The newsletter may include title information of the determined news article and body information corresponding to the aforementioned title information, or it may include the aforementioned title information and summary information by inputting body information into the aforementioned preset artificial intelligence model to receive output summary information.
[0134]
[0135] FIG. 4 is a flowchart illustrating the process of generating summary information of a news article using the body of a news article according to one embodiment.
[0136] Referring to FIG. 4, the news article recommendation device of the present disclosure can generate summary information to be included in a newsletter based on text information among collected article data.
[0137] Specifically, the news article recommendation device of the present disclosure can obtain news article information for a pre-set field and can perform preprocessing on the obtained news article information (S400).
[0138] When preprocessing is performed, the news article recommendation device of the present disclosure can input the training data into a preset artificial intelligence model to perform training (S410).
[0139] When learning is complete, the news article recommendation device of the present disclosure can translate the text information into a preset language (S420).
[0140] The news article recommendation device of the present disclosure can perform translation into a pre-set language for generating a summary of the body of a collected news article.
[0141] As mentioned above, translation of the body of a news article can be performed through a Neural Machine Translation (NMT) model comprising an encoder into which text is input, a hidden state in which translation is performed through computation, and a decoder in which the translated text is output.
[0142] As mentioned above, translation of the news article title or body text may be performed during the preprocessing stage.
[0143] When the training and translation of the artificial intelligence model are completed, the news article recommendation device of the present disclosure can input the body information of the collected news articles into a preset artificial intelligence model to output summary information (S430).
[0144] The present disclosure proposes using an LLM model, which is one of the language models, for summarizing the body information of news articles.
[0145] There are two methods for summarizing using LLM based on input text information: extractive summarization and abstractive summarization.
[0146] The extractive summarization method is a technique that generates summary information by directly extracting key sentences from the input text and combining all or part of the sentences. As mentioned above, this method includes generating summary information by extracting keywords or by calculating sentence similarity to identify important sentences and generating summary information based on those extracted important sentences. The inferential summarization method is a technique that understands the meaning of the input text and generates summary information based on the results of that understanding.
[0147] The news article recommendation device of the present disclosure can summarize text information using an LLM that incorporates any one of the summarization methods described above.
[0148] When summary information is output, a newsletter can be created and output based on the summary information corresponding to the title of the news article included in the newsletter (S440).
[0149] The news article recommendation device of the present disclosure can generate and output a newsletter containing summary information that summarizes the title information of a news article determined through a preset media company and a recommendation score, and the corresponding body information.
[0150]
[0151] FIG. 5 is a flowchart for explaining the process of recommending news articles according to one embodiment.
[0152] Referring to FIG. 5, the news article recommendation method of the present disclosure includes a step of acquiring news article data by obtaining news article information regarding a plurality of news articles collected for a predetermined field through a predetermined database (S500).
[0153] The news article recommendation device of the present disclosure can perform filtering through a recommendation score calculated based on the titles of a plurality of news articles collected for a preset field, and generate a summary of the article body to provide optimized newsletter content preferred by the user.
[0154] For example, the aforementioned news article information includes at least one of the news article title information, body information, media company name, recommendation frequency, and publication date, and can be collected from at least one website through a crawling technique based on a preset field or a preset period and stored in a preset database.
[0155] The aforementioned pre-set fields can be configured in various ways as needed, regardless of type or scope. For example, secondary batteries can be configured as a field, or autonomous vehicles can be configured as a single field. In addition, the aforementioned fields do not need to be configured as only one, and two or more fields may be configured.
[0156] The aforementioned preset period may refer to a period during which news articles are periodically updated, and may be set in units of days, weeks, or months. However, the aforementioned period is not fixed as a single period and may be set in various ways as needed.
[0157] The news article recommendation device of the present disclosure can collect at least one news article from a website through crawling. Crawling refers to visiting various web pages on the Internet using a web crawler, which is software, to collect information contained within the web pages. The news article recommendation device of the present disclosure sets a URL (Uniform Resource Locator) serving as a starting point for crawling, and if data exists at the URL within a pre-set field or a pre-set period through a pre-set library, it can acquire the title or body content of a news article as text, extract the content contained in the text, and store it in a database.
[0158] The data collected through crawling may include information on the title, body, and media outlet name of news articles, the publication date of news articles, and the frequency of user recommendations or non-recommendations; this is just one example, and various data may be collected as needed.
[0159] The aforementioned pre-configured libraries may include HTTP libraries, but are not limited thereto, and various libraries may be used as needed.
[0160] The news article recommendation device of the present disclosure can store the title and body of a news article obtained by the crawling method described above in a separate database, and can encrypt at least one of the title and body and store it in the database described above.
[0161] The news article recommendation device of the present disclosure can perform preprocessing of collected data to calculate a recommendation score for news articles or to generate a summary of the body of a news article through an artificial intelligence model.
[0162] When news article information of multiple news articles is collected through crawling, the news article recommendation device of the present disclosure separates the title information and body information of the news articles and can translate the separated title information and body information into at least one language.
[0163] In addition, the news article recommendation device of the present disclosure can perform a task to remove unnecessary data when such data is combined with each news article information as a preprocessing process of the collected data.
[0164] For example, if the artificial intelligence model is a Large Language Model (LM), which is one of the language models, the news article recommendation device of the present disclosure may perform tokenization to classify the text of collected newspaper articles into tokens, which are the smallest meaningful units; perform normalization to convert the text into a standard form (conversion to lowercase, conversion to numbers, removal of special characters); remove meaningless words; or convert the text into a vector containing numerical values. The aforementioned preprocessing method is merely an example and can be configured in various ways as needed.
[0165] The news article recommendation method of the present disclosure includes an article data refinement step (S510) of calculating a similarity between each news article and a recommendation score for each news article based on news article information, deleting duplicate articles included in a plurality of news articles by comparing the similarity with a preset threshold, and filtering a plurality of news articles based on the recommendation score.
[0166] The news article recommendation device of the present disclosure can calculate the similarity between each news article to remove articles with duplicate content among a plurality of news articles, and can calculate a recommendation score through a preset artificial intelligence model for filtering the collected news articles.
[0167] For example, the news article recommendation device of the present disclosure can convert the title information of each of a plurality of news articles into a vector based on an embedding technique, and calculate the similarity between two news articles based on a vector and a cosine similarity judgment technique.
[0168] Specifically, the news article recommendation device of the present disclosure can convert at least one word included in the title information of each of a plurality of news articles into a vector based on an embedding technique, and calculate the similarity between two news articles based on a vector and a cosine similarity judgment technique.
[0169] The present disclosure converts title information of collected news articles into a high-dimensional vector containing numbers using an embedding technique to ensure that duplicate articles are not included in recommended news provided through a newsletter, and can calculate the similarity between two news articles based on the converted vector and a cosine similarity judgment technique.
[0170] The embedding technique used for similarity calculation is a method that transforms each data point into a high-dimensional vector containing numbers while preserving the meaning of the original data.
[0171] The news article recommendation device of the present disclosure can calculate the similarity between two news articles through comparison between vectors transformed based on an embedding technique.
[0172] The present disclosure proposes using a cosine similarity judgment technique for calculating the aforementioned similarity. The cosine similarity judgment technique is a technique that measures the degree of similarity between two data using a cosine function by comparing the directionality between two vectors. Similarity can be calculated by the aforementioned mathematical formula 1.
[0173] Since the similarity value Cosθ only has values between -1 and 1, the similarity also only has values between -1 and 1, and the closer Cosθ is to 1, the higher the similarity between the titles of the two news articles can be judged.
[0174] As another example, the news article recommendation device of the present disclosure may delete one of two news articles when there are two news articles whose similarity calculated through a cosine similarity judgment technique is greater than or equal to a preset threshold.
[0175] The aforementioned preset threshold is a real number and can be set in various ways as needed.
[0176] As another example, the news article recommendation device of the present disclosure can calculate a recommendation score for each news article based on news article information regarding a plurality of news articles. The aforementioned recommendation score is calculated through a preset artificial intelligence model including a Large Language Model (LM) that includes at least one transformer, and the data input to the aforementioned LLM may be characterized as being at least one of title information and body information translated into a preset language.
[0177] LLM can be trained using at least one of a Pretraining training method that learns patterns and structures of text using a prepared large-scale training dataset and a Fine-Tuning training method that performs training according to the intended use of LLM by labeling a portion of the large-scale training dataset, and can also be performed using a few-shot training method that performs training by reflecting examples in the training data as needed.
[0178] The news article recommendation device of the present disclosure can obtain text regarding title information of a news article through one of the aforementioned preprocessing methods, and can output a recommendation score for each news article through the aforementioned text via LLM, which is a preset artificial intelligence model.
[0179] As another example, the news article recommendation device of the present disclosure may input title information into a preset artificial intelligence model, and together input at least one piece of information among the media company name, publication date, and recommendation frequency corresponding to the title information into the preset artificial intelligence model to calculate a recommendation score. The recommendation score may be calculated by assigning a weight according to a preset standard to at least one piece of information among the media company name, publication date, and recommendation frequency input together with the title information.
[0180] For example, you can assign a higher weight to news articles with a high recommendation frequency to ensure a high recommendation score is calculated. Alternatively, you can assign a high weight to the name of a specific media outlet to achieve a high recommendation score. Or, you can set it so that the more recent the publication date, the higher the weight is assigned to achieve a high recommendation score. Furthermore, you can configure the recommendation score to be calculated by setting two or more criteria for weighting instead of just one.
[0181] The weight-based recommendation score calculation method is just one example, and it can be calculated in various ways as needed.
[0182] The aforementioned recommendation score can be calculated as a real number greater than or equal to 0, and a higher recommendation score indicates a higher priority for the recommended news included in the newsletter. For example, if the weight is set to be calculated as a real number between 0 and 1, the closer it is to 1, the higher the priority of the recommended news can be set.
[0183] As another example, the news article recommendation device of the present disclosure may perform a first filtering based on the media company name for a plurality of news articles from which duplicate articles have been removed, and perform a second filtering by assigning priority in order of high recommendation score.
[0184] The news article recommendation device of the present disclosure may pre-set a predetermined number of media outlets among media outlets preferred by the user or media outlets that publish a large number of news articles, and may remove news articles that are not published by the aforementioned pre-set media outlets from among a plurality of news articles from which duplicate articles have been removed.
[0185] In addition, the news article recommendation device of the present disclosure can perform secondary filtering to determine a predetermined number of news articles by assigning priority in order of highest recommendation score based on the recommendation score output through the artificial intelligence model described above.
[0186] The news article recommendation device of the present disclosure includes a newsletter output step of outputting a newsletter containing a plurality of news articles that are refined and remaining based on similarity and recommendation score (S520).
[0187] For example, the news article recommendation device of the present disclosure inputs body information corresponding to title information of a plurality of news articles that are refined based on similarity and recommendation scores into a preset artificial intelligence model to output summary information, and the newsletter may include title information of a plurality of news articles that are refined based on similarity and recommendation scores and summary information.
[0188] The news article recommendation device of the present disclosure can determine news articles included in a newsletter through a pre-set media outlet and a recommendation score. The newsletter may include title information and body information of the determined news article, or it may include the title information and summary information by inputting body information into the aforementioned pre-set artificial intelligence model to receive output summary information.
[0189] The text input to the artificial intelligence model for outputting summary information may be text translated into a pre-set language. For example, among the collected text information as described above, Chinese and Japanese are set to be translated into English, while English and Korean are left as they are, and text translated into English or Korean can be input into the artificial intelligence model to generate summary information.
[0190] The generation of summary information through a pre-configured artificial intelligence model may be performed after determining the news articles included in the newsletter based on pre-configured media outlets and recommendation scores, or it may be performed in advance on all news articles collected through crawling prior to the determination of the news articles.
[0191] Through the operation of the aforementioned components, the duplication of recommended news is prevented, and news articles reflecting the user's interests are provided.
[0192]
[0193] FIG. 6 is a configuration diagram of a computing device including an artificial intelligence model according to one embodiment.
[0194] Referring to FIG. 6, the computing device (600) may include memory (610) and a processor (620), and the memory may include at least one artificial intelligence model (630).
[0195] The memory (610) can store a program for the operation of the processor (920) and can temporarily or permanently store input / output data. The memory (610) may include at least one type of storage medium among RAM, SRAM, ROM, EEPROM, PROM, magnetic memory, magnetic disk, optical disk, hard disk type, multimedia card micro type, flash memory type, card type memory (e.g., SD or XD memory, etc.), volatile memory (e.g., SRAM, DRAM), or non-volatile memory (e.g., NAND Flash).
[0196] In addition, the memory (610) can store various functions and algorithms, and can store various data, applications, software, commands, code, etc.
[0197] The processor (620) can control the overall operation of the news article recommendation device of the present disclosure. The processor (620) can execute one or more programs and may mean a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a dedicated processor on which methods according to some embodiments of the present disclosure are performed.
[0198] Meanwhile, the computing device (600) of the present disclosure may be a quantum computing device rather than a classic computing device. A quantum computing device performs operations in units of qubits rather than bits. A qubit can have a state in which 0 and 1 are simultaneously superpositioned, and if there are M qubits, 2^M states can be represented simultaneously.
[0199] A quantum computing device can use various types of quantum gates (e.g., Pauli / Rotation / Hadamard / CNOT / SWAP / Toffoli) that receive one or more qubits to perform quantum operations and perform specified operations, and can combine quantum gates to form a quantum circuit with a special function.
[0200] Quantum computing devices can use quantum artificial neural networks (e.g., QCNN, QGRNN) that can perform functions of conventional artificial neural networks (e.g., CNN, RNN) at a faster speed while using fewer parameters.
[0201] Additionally, the memory (610) may store an artificial intelligence model (630) that generates recommendation scores for each news article and summary information of the news article according to the present disclosure. When a task to generate recommendation scores for each news article and summary information of the news article is requested, the processor (620) may execute the artificial intelligence model (630) stored in the memory (610) to generate recommendation scores for each news article and summary information of the news article and output the result.
[0202] For example, the processor (620) can obtain news article information regarding a plurality of news articles collected for a preset field through a preset database, calculate a similarity between each news article and a recommendation score for each news article based on the news article information, delete duplicate articles included in the plurality of news articles by comparing the similarity with a preset threshold, filter the plurality of news articles based on the recommendation score, and output a newsletter containing the plurality of news articles that have been refined and are remaining based on the similarity and recommendation score.
[0203]
[0204] FIG. 7 is a configuration diagram of a computer system including a client-server that includes an artificial intelligence model according to one embodiment.
[0205] Referring to FIG. 7, a computing system according to one embodiment of the present invention may include a computer device (700) including memory (730) and a processor (74), and a server (710) including memory (750) and a processor (760). The computer device (700) and the server (710) may be connected via a wired or wireless connection through a network (720).
[0206] The network (720) connecting the aforementioned computer device (700) and server (710) can also be configured as a network of various sizes, such as a Local Area Network (LAN), a Wide Area Network (WAN), a Value Added Network (VAN), a mobile radio communication network, etc.
[0207] The memory (730) of the computer device (700) can store news article information about the collected news articles.
[0208] The memory (760) of the server (710) can store an artificial intelligence model (770) that generates recommendation scores for each news article and summary information of the news articles as described above.
[0209] The processor (740) of the computer device (700) can send a request (Query) to the server (710) for summary information, which includes news article information about news articles stored in memory (730) and recommendation scores and body information for each news article.
[0210] The processor (760) of the server (710) can generate the aforementioned recommendation score and summary information by using an artificial intelligence model (770) that generates a recommendation score for each news article and summary information for news article information received for the news article, and can transmit the result to a computer device (700).
[0211]
[0212] The foregoing description is merely an illustrative explanation of the technical concept of the present disclosure, and those skilled in the art to which the present disclosure pertains may make various modifications and variations within the scope of the essential characteristics of the technical concept. Furthermore, since these embodiments are intended to explain, not limit, the scope of the technical concept is not limited by these embodiments. The scope of protection of the present disclosure shall be interpreted by the claims below, and all technical concepts within an equivalent scope shall be interpreted as being included within the scope of rights of the present disclosure.
[0213]
[0214] CROSS-REFERENCE TO RELATED APPLICATION
[0215] This patent application claims priority pursuant to Section 119(a) of the U.S. Patent Act (35 USC § 119(a)) to Korean Patent Application No. 10-2024-0191874 filed on December 19, 2024, all of which are incorporated by reference into this patent application. Furthermore, this patent application claims priority in countries other than the United States for the same reasons as above, all of which are incorporated by reference into this patent application.
Claims
1. An article data acquisition unit that acquires news article information regarding multiple news articles collected for a pre-set field through a pre-set database; Article data refinement unit that calculates the similarity between each news article and the recommendation score for each news article based on the above news article information, deletes duplicate articles included in the plurality of news articles by comparing the similarity with a preset threshold, and filters the plurality of news articles based on the recommendation score; A news article recommendation device comprising a newsletter output unit that outputs a newsletter containing a plurality of news articles that are refined and remaining based on the similarity and recommendation score.
2. In Paragraph 1, The information in the above news article is, It includes at least one of the title information, body information, media outlet name, recommendation frequency, and start date of the above news article, and A news article recommendation device characterized by collecting from at least one website through a crawling technique based on the above-mentioned preset field and preset period and storing in the above-mentioned preset database.
3. In Paragraph 2, The above article data refinement department, The title information of each of the above multiple news articles is converted into a vector based on an embedding technique, and A news article recommendation device characterized by calculating the similarity between two news articles based on the above vector and cosine similarity judgment techniques.
4. In Paragraph 2, The above recommended score is, It is produced through a pre-configured artificial intelligence model that includes a Large Language Model (LM) containing at least one transformer, and The data input into the above LLM is, A news article recommendation device characterized by having at least one of the title information and the body information translated into a preset language.
5. In Paragraph 2, The above article data refinement department, At least one of the above title information, above media company name, above start date, and above recommendation frequency is input into the above preset artificial intelligence model to calculate the above recommendation score, and The above recommended score is, A news article recommendation device characterized by being calculated by assigning a weight to at least one of the above media company name, above start date, and above recommendation frequency.
6. In Paragraph 5, The above article data refinement department, First filtering is performed on the aforementioned multiple news articles based on the names of the aforementioned media outlets, and A news article recommendation device characterized by performing secondary filtering on a plurality of news articles by assigning priority in order of the highest recommendation scores.
7. In Paragraph 6, The above newsletter output unit is, Based on the aforementioned similarity and recommendation scores, the body information corresponding to the title information of each of the aforementioned multiple news articles that remain refined is input into the aforementioned preset artificial intelligence model to receive summary information, and The above newsletter is, A news article recommendation device characterized by including the title information and summary information of the plurality of news articles that are refined and remaining based on the similarity and recommendation scores.
8. An article data acquisition step of acquiring news article information regarding multiple news articles collected for a pre-set field through a pre-set database; Article data refinement step of calculating the similarity between each news article and the recommendation score for each news article based on the above news article information, deleting duplicate articles included in the plurality of news articles by comparing the similarity with a preset threshold, and filtering the plurality of news articles based on the recommendation score; A news article recommendation method comprising a newsletter output step of outputting a newsletter containing the plurality of news articles that are refined and remaining based on the similarity and recommendation score.
9. In Paragraph 8, The information in the above news article is, It includes at least one of the title information, body information, media outlet name, recommendation frequency, and start date of the above news article, and A news article recommendation method characterized by collecting from at least one website through a crawling technique based on the above-mentioned preset field and preset period and storing in the above-mentioned preset database.
10. In Paragraph 9, The above-mentioned preset artificial intelligence model is, It includes a Large Language Model (LM) containing at least one transformer, and The data input into the above LLM is, A news article recommendation method characterized by having at least one of the title information and the body information translated into a preset language.
11. In Paragraph 9, The above article data refinement step is, The title information of each of the above multiple news articles is converted into a vector based on an embedding technique, and A news article recommendation method characterized by calculating the similarity between the two news articles based on the above vector and cosine similarity judgment techniques.
12. In Paragraph 9, The above article data refinement step is, At least one of the above title information, above media company name, above start date, and above recommendation frequency is input into the above preset artificial intelligence model to calculate the above recommendation score, and The above recommended score is, A news article recommendation method characterized by being calculated by assigning a weight to at least one of the above media company name, above start date, and above recommendation frequency.
13. In Paragraph 12, The above article data refinement step is, First filtering is performed on the aforementioned multiple news articles based on the names of the aforementioned media outlets, and A news article recommendation method characterized by performing secondary filtering on a plurality of news articles by assigning priority in order of the recommendation scores.
14. In Paragraph 13, The above newsletter output step is, The body information corresponding to the title information of each of the plurality of news articles determined based on the recommendation score is input into the preset artificial intelligence model to receive summary information, and The above newsletter is, A news article recommendation method characterized by including the title information and summary information of the plurality of news articles determined based on the recommendation score.