Electronic device, method, and non-transitory computer-readable storage medium for performing paste function

The electronic device efficiently captures and pastes structured text data across screens by using OCR and AI models, addressing the limitations of existing clipboard operations.

WO2026135162A1PCT designated stage Publication Date: 2026-06-25SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2025-12-16
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing electronic devices lack an efficient method to capture, process, and paste structured text data across different screens, limiting the functionality of clipboard operations.

Method used

An electronic device with a processor, memory, and display that captures an image of a first screen, extracts text using OCR, converts it into structured data, and inputs it into an input field on a second screen based on context information using artificial intelligence models.

Benefits of technology

Enables seamless conversion and pasting of structured text data across screens, enhancing the clipboard functionality and improving user interaction with electronic devices.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025021863_25062026_PF_FP_ABST
    Figure KR2025021863_25062026_PF_FP_ABST
Patent Text Reader

Abstract

This electronic device comprises a memory for storing instructions, a display, and at least one processor including processing circuitry. The instructions, when executed individually or collectively by the at least one processor, cause the electronic device to: when a first user command for capturing a first screen of the display is identified, acquire a captured image of the first screen and additional information related to the first screen; when a text included in the captured image is identified, convert the identified text into structured text data on the basis of the additional information; acquire paste content on the basis of the structured text data and context information about a second screen of the display; and input the paste content to an input field included in the second screen.
Need to check novelty before this filing date? Find Prior Art

Description

Electronic device, method, and non-transient computer-readable storage medium that perform a paste function

[0001] The present disclosure relates to an electronic device, a method, and a non-transient computer-readable storage medium that perform a paste function.

[0002] Recently, the distribution of various types of portable electronic devices, such as smartphones, tablet PCs, wireless earphones, and smartwatches, is expanding.

[0003] Recently, electronic devices can provide a variety of functions and services. For example, clipboard-based functions are being conveniently used in various situations.

[0004] The information described above may be provided as related art for the purpose of aiding understanding of the present disclosure. No claim or determination is made as to whether any of the foregoing may be applied as prior art related to the present disclosure.

[0005] An electronic device (100) according to one embodiment comprises: a memory (120) for storing instructions; a display (130); and at least one processor (110) including processing circuitry; wherein, when the instructions are executed individually or collectively by the at least one processor, the electronic device obtains a captured image of the first screen and additional information related to the first screen when a first user command for capturing a first screen of the display is identified, and when text included in the captured image is identified, converts the identified text into structured text data based on the additional information, obtains paste content based on the structured text data and context information of a second screen of the display, and inputs the paste content into an input field included in the second screen.

[0006] A control method for an electronic device according to one embodiment comprises: an operation of obtaining a captured image of the first screen and additional information related to the first screen when a first user command for capturing a first screen is identified; an operation of converting the identified text into structured text data based on the additional information when text included in the captured image is identified; an operation of obtaining paste content based on the structured text data and context information of the second screen; and an operation of inputting the paste content into an input field included in the second screen.

[0007] A non-transient computer-readable medium storing instructions that cause the electronic device to perform an operation when executed by a processor of an electronic device according to one embodiment, wherein the operation comprises: an operation of obtaining a captured image of the first screen and additional information related to the first screen when a first user command for capturing a first screen is identified; an operation of converting the identified text into structured text data based on the additional information when text included in the captured image is identified; an operation of obtaining paste content based on the structured text data and context information of the second screen; and an operation of inputting the paste content into an input field included in the second screen.

[0008] The above and other aspects, features, and advantages of specific embodiments of the present disclosure will become more apparent from the following description taken together with the accompanying drawings.

[0009] FIG. 1 is a drawing for explaining an example of a clipboard function according to one embodiment.

[0010] FIG. 2 illustrates an example of a block diagram of an electronic device according to one embodiment.

[0011] FIG. 3 is a flowchart illustrating the operation of an electronic device according to one embodiment.

[0012] FIG. 4 is a diagram illustrating a generative artificial intelligence model according to one embodiment.

[0013] FIGS. 5a to 5d are drawings for explaining examples of a method for obtaining a captured image and additional information according to one embodiment.

[0014] FIG. 6 is a drawing for explaining an example of information obtained according to a first user command according to one embodiment.

[0015] FIG. 7 is a drawing for illustrating an example of structured text data according to one embodiment.

[0016] FIG. 8 is a flowchart illustrating a method for obtaining paste content according to one embodiment.

[0017] FIGS. 9a and 9b are drawings for explaining a paste operation according to one embodiment.

[0018] FIG. 10 is a diagram illustrating a method for analyzing an input field of a second screen according to one embodiment.

[0019] FIGS. 11a to 11c are drawings for explaining a method of context analysis of a second screen according to one embodiment.

[0020] FIGS. 12a to 12c are drawings for explaining a method of obtaining paste content according to one embodiment.

[0021] FIGS. 13a to 13e are drawings for explaining examples of paste content according to different paste types according to one embodiment.

[0022] FIGS. 14a to 14c are drawings for explaining examples of UI screens according to one embodiment.

[0023] FIGS. 15a to 15d are drawings for explaining examples of UI screens according to one embodiment.

[0024] FIGS. 16a to 16e are drawings for explaining examples of UI screens according to one embodiment.

[0025] FIGS. 17a and FIGS. 17b are drawings for explaining an example of a UI screen according to one embodiment.

[0026] FIG. 18 is a diagram illustrating an example of performing a paste operation using a previously saved captured image according to one embodiment.

[0027] FIG. 19 is a block diagram of an electronic device in a network environment according to various embodiments.

[0028] The present disclosure will be described in detail below with reference to the attached drawings.

[0029] The terms used in the embodiments of this disclosure have been selected to be as widely used and general as possible, taking into account their functions within this disclosure; however, these terms may vary depending on the intent of those skilled in the art, case law, or the emergence of new technologies. Additionally, in specific cases, terms may be selected at the applicant's discretion, and in such cases, their meanings will be described in detail in the description section of the disclosure. Therefore, terms used in this disclosure should be defined not merely by their names, but based on their meanings and the overall content of this disclosure.

[0030] In this specification, expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of the above features (e.g., numerical values, functions, actions, or components such as parts) and do not exclude the presence of additional features.

[0031] The expression "at least one of A and / or B" should be understood as representing either "A" or "B" or "A and B".

[0032] Expressions such as "first," "second," "first," or "second" used in this specification may modify various components regardless of order and / or importance, and are used only to distinguish one component from another and do not limit said components.

[0033] Where it is stated that a component (e.g., a first component) is "(operatively or communicatively) coupled with / to" or "connected to" another component (e.g., a second component), it should be understood that the component may be directly connected to the other component or connected through the other component (e.g., a third component).

[0034] The singular expression includes the plural expression unless the context clearly indicates otherwise. In this application, terms such as “comprising” or “consisting” are intended to specify the existence of the features, numbers, actions, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, actions, actions, components, parts, or combinations thereof.

[0035] In the embodiments, a "module" or "part" performs at least one function or operation and may be implemented in hardware or software, or a combination of hardware and software. Additionally, a plurality of "modules" or a plurality of "parts" may be integrated into at least one module and implemented by at least one processor, except for a "module" or "part" that needs to be implemented in specific hardware.

[0036] In the present disclosure, the term "user" may refer to a person using an electronic device or a device using an electronic device (e.g., an artificial intelligence electronic device).

[0037] The various elements and areas in the drawings are depicted schematically. Accordingly, the technical concept of the present invention is not limited by the relative sizes or spacing depicted in the attached drawings.

[0038] Embodiments of the present disclosure will be described in more detail below with reference to the attached drawings.

[0039] FIG. 1 is a drawing for explaining an example of a clipboard function according to one embodiment.

[0040] According to one embodiment, the clipboard function provided by the electronic device (100) may be a function that temporarily stores data in a digital device to enable at least one of copy, cut, and paste operations.

[0041] According to one example, when a save button tab (21) included in a UI (20) for a clipboard function is selected while a web browser screen (10) is displayed on an electronic device (100) according to a user command, a capture operation for the currently provided web browser screen (10) can be performed. For example, when a captured image of the web browser screen (10) is obtained, the captured image can be saved to the clipboard. The clipboard may be a temporary storage that can store various data such as text, images, files and / or URLs.

[0042] Hereinafter, various embodiments of performing a paste function using a captured image stored in the clipboard will be described.

[0043] FIG. 2 illustrates an example of a block diagram of an electronic device according to one embodiment.

[0044] According to various embodiments, the electronic device (100) of FIG. 2 may be at least partially similar to the electronic device (2401) of FIG. 20, or may include other embodiments of the electronic device.

[0045] In one embodiment, in terms of being owned by a user, the electronic device (100) may be referred to as a terminal (or user terminal). The terminal may include, for example, a personal computer (PC) such as a laptop and a desktop. The terminal may include, for example, a smartphone, a smartpad, and / or a tablet PC. The terminal may include smart accessories such as a smartwatch and / or a head-mounted device (HMD). According to one embodiment, the electronic device (100) may include a deformable housing. Based on the deformability, the housing of the electronic device (100) may be divided into a plurality of parts. According to one example, the electronic device (100) may be implemented as a user terminal (40) illustrated in FIG. 1.

[0046] According to one embodiment, the electronic device (100) may include at least one of a processor (110), a memory (120), a display (130), a communication circuit (140), a camera (150), a sensor (160), or a microphone (170). The processor (110), memory (120), display (130), communication circuit (140), camera (150), sensor (160), or microphone (170) may be electrically and / or operably coupled with each other by an electronic component such as a communication bus.

[0047] In one embodiment, the hardware of the electronic device (100) being operatively coupled may mean that a direct or indirect connection between the hardware is established via wired or wireless means so that the second hardware is controlled by the first hardware among the hardware. Although illustrated based on different blocks, the embodiment is not limited thereto, and some of the hardware of FIG. 2 (e.g., at least some of the processor (110), memory (120), and communication circuit (140)) may be included in a single integrated circuit, such as a system on a chip (SoC). The type and / or number of hardware included in the electronic device (100) is not limited to that shown in FIG. 2. For example, the electronic device (100) may include only some of the hardware components shown in FIG. 2.

[0048] According to one embodiment, the processor (110) of the electronic device (100) may include hardware for processing data based on one or more instructions. The hardware for processing data may include, for example, an arithmetic and logic unit (ALU), a floating point unit (FPU), a field programmable gate array (FPGA), a central processing unit (CPU), a graphic processing unit (GPU), a neural processing unit (NPU), and / or an application processor (AP). The number of processors (110) may be one or more. For example, the processor (110) may have the structure of a multi-core processor such as a dual core, a quad core, or a hexa core.

[0049] The processor (110) can control the operations of the electronic device (100) by executing instructions stored in memory (120). For example, the processor (110) may correspond to a plurality of processors that divide and collectively perform a plurality of operations among the processors.

[0050] A CPU (central processing unit) is a general-purpose processor capable of performing not only general operations but also artificial intelligence operations, and it can efficiently execute complex programs through a multi-layered cache structure. The CPU is advantageous for serial processing methods, which enable the organic linkage between previous and next calculation results through sequential computation. General-purpose processors are not limited to the examples mentioned above, except for cases specified as the aforementioned CPU.

[0051] A GPU (graphic processing unit) is a processor designed for massive computations, such as floating-point operations used in graphics processing, and can perform large-scale computations in parallel by integrating a large number of cores. In particular, GPUs may be advantageous over CPUs for parallel processing methods such as convolution operations. Additionally, GPUs can be used as co-processors to complement the functions of CPUs. Processors for massive computation are not limited to the examples mentioned above, except for cases specified as GPUs.

[0052] A Neural Processing Unit (NPU) is a processor specialized for artificial intelligence computations using artificial neural networks, and each layer constituting the neural network can be implemented in hardware (e.g., silicon). In this case, since the NPU is designed specifically according to the specifications required by the vendor, it has a lower degree of flexibility compared to CPUs or GPUs, but it can efficiently process the artificial intelligence computations required by the vendor. Meanwhile, as a processor specialized for artificial intelligence computations, the NPU can be implemented in various forms such as Tensor Processing Units (TPUs), Intelligence Processing Units (IPUs), and Vision Processing Units (VPUs). Artificial intelligence processors are not limited to the examples mentioned above, except for cases specified as the aforementioned NPU.

[0053] According to one embodiment, the memory (120) of the electronic device (100) may include a hardware component for storing data and / or instructions that are input and / or output to the processor (110). The memory (120) may include, for example, volatile memory such as random-access memory (RAM) and / or non-volatile memory such as read-only memory (ROM). Volatile memory may include, for example, at least one of dynamic RAM (DRAM), static RAM (SRAM), cache RAM, and pseudo SRAM (PSRAM). Non-volatile memory may include, for example, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), flash memory, hard disk, compact disk, solid status drive (SSD), or embedded multimedia card (eMMC).

[0054] According to one embodiment, within the memory (120) of the electronic device (100), one or more instructions (or commands) representing operations and / or operations to be performed on data by the processor (110) may be stored. A set of one or more instructions may be referred to as firmware, an operating system, a process, a routine, a sub-routine, and / or an application. For example, the electronic device (100) and / or the processor (110) may perform various operations when a set of a plurality of instructions distributed in the form of an operating system, firmware, a driver, and / or an application is executed. In the following, the statement that an application is installed on an electronic device (100) means that one or more instructions provided in the form of an application are stored in the memory (120) of the electronic device (100), and that the one or more applications are stored in an executable format (e.g., a file having an extension specified by the operating system of the electronic device (100)) that is executable by the processor (110) of the electronic device (100).

[0055] At least one processor (110) controls the processing of input data according to a predefined operation rule or AI model (artificial-intelligence model) stored in memory (120). The predefined operation rule or AI model is characterized by being created through learning. Being created through learning means that a predefined operation rule or AI model with desired characteristics is created by applying a learning algorithm to a number of learning data. Such learning may be performed on the device itself where the artificial intelligence according to the present disclosure is performed, or it may be performed through a separate server / system.

[0056] An AI model may be composed of multiple neural network layers. At least one layer has at least one weight value and performs the layer's operation through the result of the operation of the previous layer and at least one defined operation. Examples of neural networks include convolutional neural networks (CNN), recurrent neural networks (RNN), deep neural networks (DNN), restricted Boltzmann machines (RBM), deep belief networks (DBN), bidirectional recurrent deep neural networks (BRDNN), deep Q-networks, and Transformers; however, the neural networks in this disclosure are not limited to the aforementioned examples except where specified.

[0057] A learning algorithm is a method of training a specific target device (e.g., a robot) using a number of learning data to enable the target device to make decisions or predictions on its own. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, and the learning algorithms in this disclosure are not limited to the aforementioned examples except where specified.

[0058] According to one embodiment, a display (130) of an electronic device (100) can output visualized information to a user. For example, the display (130) can be controlled by a controller, such as a GPU (graphic processing unit), to output visualized information to a user. The display (130) may include an OLED (organic light emitting diodes) display, an LED (light emitting diodes), a micro LED, a mini LED, a PDP (plasma display panel), a QD (quantum dot) display, a QLED (quantum dot light-emitting diodes) and / or an e-ink display and / or an e-paper display. According to one example, the display (130) may be implemented as a flat display, a curved display, a folding and / or rolling flexible display.

[0059] A communication circuit (140) of an electronic device (100) according to one embodiment may include hardware for supporting the transmission and / or reception of electrical signals between the electronic device (100) and an external device (e.g., a server). The communication circuit (140) may include, for example, at least one of a modem (modulator and demodulator), an antenna, and an optic / electronic converter. The communication circuit (140) may support the transmission and / or reception of electrical signals based on various types of protocols such as Ethernet, LAN (local area network), WAN (wide area network), WiFi (wireless fidelity), NFC (near field communication), Bluetooth, BLE (bluetooth low energy), ZigBee, LTE (long term evolution), 5G NR (new radio), and / or 6G.

[0060] According to one example, the electronic device (100) may be connected to a server and to each other based on a wired network and / or a wireless network. The wired network may include a network such as the Internet, a LAN (local area network), a WAN (wide area network), Ethernet, or a combination thereof. The wireless network may include a network such as LTE (long term evolution), 5G NR (new radio), WiFi (wireless fidelity), Zigbee, NFC (near field communication), Bluetooth, BLE (bluetooth low-energy), or a combination thereof. According to one example, the electronic device (100) and the server may be connected indirectly through an intermediate node within the network.

[0061] A camera (150) of an electronic device (100) according to one embodiment can convert a captured image into an electrical signal and generate image data based on the converted signal. For example, the camera (150) may include at least one of a general (or basic) camera, a depth camera, and an ultra-wide angle camera.

[0062] A sensor (160) of an electronic device (100) according to one embodiment can sense various user information. The sensor (160) can be implemented as various types of sensors capable of user sensing. For example, the sensor (160) may include at least one sensor among a time of flight (ToF) sensor, an ultrasonic sensor, a radio detection and ranging (RADAR) sensor, a photodiode sensor, a proximity sensor, a passive infrared (PIR) sensor, a pinhole sensor, a pinhole camera, an infrared human body detection sensor, a complementary metal oxide semiconductor (CMOS) image sensor, a thermal sensor, a light sensor, and a motion sensor.

[0063] The sensor (160) may include a touch sensor that detects touch actions, having a form such as a touch film, a touch sheet, or a touch pad.

[0064] The sensor (160) may include at least one of a CO2 sensor and an atmospheric pressure sensor. The CO2 sensor is a sensor for measuring carbon dioxide concentration. The atmospheric pressure sensor is a sensor for sensing ambient pressure.

[0065] The sensor (160) may further include at least one sensor capable of sensing ambient illuminance, ambient temperature, and the direction of incidence of light. In this case, the sensor (160) may be implemented as an illuminance sensor, a temperature sensing sensor, a light intensity sensing layer, and a camera.

[0066] The sensor (160) may further include at least one of an acceleration sensor (or gravity sensor), a geomagnetic sensor, and a gyro sensor. For example, the acceleration sensor may be a 3-axis acceleration sensor. The 3-axis acceleration sensor may measure gravitational acceleration by axis and provide raw data to the processor (140). The geomagnetic sensor or the gyro sensor may be used to obtain attitude information. Here, the attitude information may include at least one of roll information, pitch information, or yaw information.

[0067] A microphone (170) of an electronic device (100) according to one embodiment is configured to receive user voice or other sounds and convert them into audio data.

[0068] FIG. 3 is a flowchart illustrating the operation of an electronic device according to one embodiment.

[0069] In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0070] According to one embodiment, operations 310 to 370 can be understood as being performed in the processor (110) of the electronic device (100).

[0071] According to FIG. 3, in operation 310, an electronic device (100) according to one embodiment can identify a first user command for capturing a first screen.

[0072] According to one example, a screen may be an interface that visually displays information in an electronic device (100). The screen may output various data including at least one of text, images, and videos through a display (130) and enable interaction between a user and the electronic device (100).

[0073] According to one example, screen capture may be a function that saves content displayed on a screen provided by an electronic device (100) in the form of an image and / or video. For example, screen capture may include at least one of a screenshot that saves an image of the entire screen, a partial capture that saves only a selected area of ​​the screen, and a scroll capture that saves a long screen that requires scrolling.

[0074] The first user command may be a user command for capturing a screen. For example, the first user command may be a pre-configured user command that can be recognized as a screen capture command by the electronic device (100). For example, the first user command may include at least one of a physical button input, a soft button input, and a user's hand gesture input. For example, the physical button input may include simultaneous input of a power button and a volume down button. For example, the physical button input may include simultaneous input of a power button and a volume up button. For example, the physical button input may include simultaneous input of a power button and a home button. The soft button input may include a save button tap input displayed on the clipboard. For example, the hand gesture input may include at least one of a user's three-finger swipe, palm swipe, knock gesture, and specific touch gesture. For example, the first user command may vary depending on at least one of the manufacturer of the electronic device (100), the type of the electronic device (100), and user settings, and may include commands of a different type in addition to the types described above.

[0075] In operation 320, an electronic device (100) according to one embodiment can obtain a captured image of a first screen and additional information (or metadata or metadata) related to the first screen.

[0076] According to one example, the captured image (or screenshot image) of the first screen may be a file that captures the content displayed on the screen in image form. For example, the captured image may be an image file format such as PNG and JPEG.

[0077] According to one example, additional information related to the first screen may include at least one of application information, uniform resource locator (URL) information, timestamp information, device information, location information, captured image attribute information, and EXIF ​​data. For example, application information may include at least one of the name of the app and the type of the app in use at the time the screenshot was captured. For example, URL information may be obtained when capturing a web browser. For example, the timestamp may include the date and / or time when the screenshot was created. For example, device information may include at least one of the name, model, and operating system (OS) version of the device used for the capture (e.g., electronic device (100)). For example, location information may include GPS coordinates and information about the location where the screenshot was captured. For example, captured image attribute information may include at least one of file size, resolution, color information, and image format. For example, if the captured image includes a photographic image, EXIF ​​data may include at least one of the information of the device that took the photographic image, software information, time of shooting, and location of shooting.

[0078] In operation 330, the electronic device (100) according to one embodiment can identify (or recognize) text included in a captured image of the first screen.

[0079] According to one example, the electronic device (100) may extract text from a page included in the first screen or from a captured image of the first screen. For example, the electronic device (100) may extract text from a page included in the first screen using a programming language and / or library. For example, the electronic device (100) may extract text from a captured image using an optical character recognition (OCR) engine. The OCR engine may be software that automatically recognizes and extracts text from an image or a scanned document. For example, the OCR engine may be stored in at least one of the electronic device (100) and an external server.

[0080] According to one example, when a first user command for capturing a first screen is identified, the electronic device (100) can identify a foreground application of the first screen. The foreground application may be an active application that provides the first screen currently displayed on the display (130).

[0081] According to one example, the electronic device (100) may extract text included in a captured image of a first screen using at least some different methods depending on the type of foreground application. For example, the type of application may include at least one of a web browser, a web application, and a native application. In addition, the type of application may include at least one of a document application, a chat application, a messenger application, and a game application. According to one example, at least one of a document application, a chat application, a messenger application, and a game application may be an example of a native application.

[0082] For example, the electronic device (100) can extract text from a page included in the first screen if the foreground application corresponding to the first screen is a web browser. For example, a web browser may be software that helps a user access the internet and browse web pages. For example, a web browser may display web pages on the screen by interpreting web technologies such as at least one of HTML, CSS, and JavaScript. However, even in the case of a web browser, the electronic device (100) may use additional or alternative OCR technology if necessary.

[0083] For example, the electronic device (100) can extract text from a page included in the first screen if the foreground application corresponding to the first screen is a web application. For example, the web application may be an application executed through a web browser. However, even in the case of a web application, the electronic device (100) may use additional or alternative OCR technology if necessary.

[0084] For example, the electronic device (100) can directly extract text from a web page composed of HTML through web scraping when the foreground application is at least one of a web browser and a web application.

[0085] For example, the electronic device (100) can extract text from a captured image of the first screen using OCR technology if the foreground application corresponding to the first screen is a native application. For example, the native application may be an application developed for a specific operating system (e.g., Android, iOS, Windows). For example, the native application may be an application that requires downloading and installation from an operating system app store.

[0086] For example, if the foreground application is a native application, the electronic device (100) can preprocess a captured image through at least one of binarization, noise removal, and slope correction, and apply OCR technology to the preprocessed image to extract text.

[0087] According to one example, the electronic device (100) may store in memory (120) a captured image of the first screen obtained in operation 320, additional information related to the first screen, and text included in the captured image of the first screen obtained in operation 330.

[0088] According to one example, the electronic device (100) may temporarily store in a clipboard a captured image of a first screen obtained in operation 320, additional information related to the first screen, and text included in the captured image of the first screen obtained in operation 330. The clipboard may be a memory area for temporarily storing data.

[0089] In operation 340, the electronic device (100) according to one embodiment can convert text into structured text data based on additional information related to the first screen. For example, "convert" is merely a term for convenience of explanation and can be replaced with at least one of "acquire," "identify," and "generate."

[0090] According to one example, structured text data may include text data structured according to at least one of JSON format, CSV format, Pandas DataFrame format, and XML format. For example, JSON format may be a data structure of key-value pairs. For example, CSV format may be a data structure of rows and columns. For example, XML format may be a data structure in the form of markup with a hierarchical structure.

[0091] According to one example, additional information related to the first screen may include at least one of application information, URL information, timestamp information, device information, location information, capture image attribute information, and EXIF ​​data as described above.

[0092] According to one example, an electronic device (100) can obtain structured text data by applying at least one of a predefined rule, pattern, format, algorithm, and template to additional information related to text and screen.

[0093] According to one example, the electronic device (100) can obtain structured text data by inputting a prompt obtained based on additional information related to the text (or captured image) and screen included in the captured image into a second artificial intelligence model.

[0094] According to one example, the electronic device (100) can obtain structured text data by inputting a prompt obtained based on a captured image of a screen and additional information related to the screen into a second artificial intelligence model. For example, when a captured image is input into the second artificial intelligence model, the second artificial intelligence model can also perform OCR for text recognition.

[0095] According to one example, the electronic device (100) can analyze the characteristics of text based on additional information related to the first screen and convert the text into structured text data based on the characteristics of the text. For example, the electronic device (100) can identify structured format information based on the characteristics of the text based on the type of application and obtain structured text data based on the structured format information. For example, the structured format information may include at least one of ON format, CSV format, Pandas DataFrame format, and XML format.

[0096] According to one example, the electronic device (100) can convert text into structured text data of different formats depending on the type of application.

[0097] According to one example, if the application information includes application information of a first type, the electronic device (100) can convert text into text data structured in a format corresponding to the first type. According to one example, if the application information includes application information of a second type different from the first type, the electronic device (100) can convert text into text data structured in a format corresponding to the second type. For example, text data structured in a format corresponding to the second type may have a format different from text data structured in a format corresponding to the first type.

[0098] For example, in the case of a web application, the web page structure may be configured based on HTML / CSS. Accordingly, the electronic device (100) can obtain structured text data by associating identified text with HTML tags.

[0099] For example, in the case of a native application, it has a UI structure optimized for the platform, and the UI can be composed of XML or JSON-based data. Accordingly, the electronic device (100) can distinguish text based on XML or JSON-based data and match it with UI components to obtain structured text data.

[0100] For example, in the case of a document application (e.g., PDF viewer, document scanning app), the layout of the document may be structured. Accordingly, the electronic device (100) can obtain structured text data by obtaining location information of the text.

[0101] For example, in the case of chat applications and messenger applications, text may primarily be in the form of a continuous conversation. Accordingly, the electronic device (100) can analyze the conversation form to determine the speaker and message order, and organize the speaker, time, and message content as keys to obtain structured text data in the form of JSON or a list.

[0102] For example, in the case of a game application, text is likely to be combined with graphic elements. Accordingly, the electronic device (100) can determine the location of text within the UI by recording the coordinates of the text area and obtain structured text data by classifying information into specific categories according to the game screen structure.

[0103] According to one example, the electronic device (100) can convert text into structured text data of different formats based on URL characteristics. According to one example, the electronic device (100) can analyze the characteristics of text based on URL characteristics and obtain structured text data based on the characteristics of text. For example, the electronic device (100) can identify whether a web page is a static web page, a dynamic web page, or an API-based web page based on the data format provided by the URL, and obtain structured text data based on the characteristics of text according to the characteristics of each web page.

[0104] According to one example, the electronic device (100) can map the structured text data obtained in operation 340 with the data obtained in operation 320 and operation 330 and store it in memory (120).

[0105] According to one example, the electronic device (100) can map structured text data obtained in operation 340 with data obtained in operation 320 and store it in memory (120).

[0106] According to one example, the electronic device (100) can map the structured text data obtained in operation 340 with the data obtained in operation 320 and operation 330 and temporarily store it in the clipboard.

[0107] According to one example, the electronic device (100) can map the structured text data obtained in operation 340 with the data obtained in operation 320 and temporarily store it in the clipboard.

[0108] In operation 350, the electronic device (100) according to one embodiment can obtain context information of the second screen.

[0109] In one example, the second screen may be a screen different from the first screen captured according to the first user command. For example, the second screen may be a screen where a second user command for pasting is entered.

[0110] According to one example, the context information of the second screen may include at least one of information displayed on the screen, surrounding environment information of the electronic device (100), usage history information of the electronic device (100), and user profile information. For example, the electronic device (100) may obtain information displayed on the screen in different ways depending on the interface type of the second screen. For example, the interface type of the second screen may include at least one of an interactive interface, a document creation interface, and a simple form interface.

[0111] For example, the surrounding environment information of the electronic device (100) may include at least one of device information, time information, environment information, and location information. For example, the device information may include information about the device used by the user, the type of device, the screen size, and the operating system. For example, the time information may include time-related information such as the current time, day of the week, season, etc. For example, the environment information may include information about the surrounding environment such as weather information, temperature information, illuminance information, etc.

[0112] For example, the usage history information of the electronic device (100) may include at least one of the user's real-time usage information, the user's past usage information, or the user's habit information. For example, the usage history information may include at least one of website visit records, search history, shopping mall purchase history, specific service usage time periods, and click counts.

[0113] For example, user profile information may include at least one of gender, age, race, interests, and preferences.

[0114] In operation 360, an electronic device (100) according to one embodiment can obtain paste content based on structured text data and context information of a second screen.

[0115] According to one example, the electronic device (100) may obtain paste content to be entered into an input field based on structured text data and context information of a second screen. For example, the input field may be a UI element that allows a user to enter various data such as text, images, numbers, emails, and passwords. For example, the input field may be provided in at least one of a search bar, a messaging application, or a website. For example, the paste content may be content that is reused by pasting.

[0116] According to one example, the electronic device (100) may obtain paste content to be entered into an input field based on structured text data and context information of a second screen according to a second user command for pasting. For example, the second user command may be a command to select the “paste” option after touching the input field for a preset time or longer. However, it is not limited thereto and may be various types of operations that are preset in the electronic device (100) or can be set and / or changed by the user.

[0117] According to one example, the electronic device (100) may input a prompt obtained based on structured text data and context information of a second screen into a first artificial intelligence model to obtain paste content to be entered into an input field. The prompt may be a text input that instructs a specific algorithm or artificial intelligence model to perform a desired task. According to one example, the electronic device (100) may obtain a prompt according to a pre-set rule. Rule-based prompt generation may be a method of generating a prompt using at least one of pre-defined rules, patterns, formats, and templates.

[0118] According to one example, the first artificial intelligence model may be an on-device model included in the electronic device (100) or a model included in an external device (e.g., a server). According to one example, at least some functions of the first artificial intelligence model may be executed in the electronic device (100) and at least some functions may be implemented in an external device (e.g., a server). However, for the convenience of explanation, the following description assumes that the first artificial intelligence model is an on-device model included in the electronic device (100).

[0119] For example, a trained artificial intelligence model may include a trained Large Multimodal Model. The Large Multimodal Model may be an artificial intelligence model capable of simultaneously processing and understanding different data formats. For instance, the Large Multimodal Model may be an AI engine trained using image and text information, capable of receiving image data and text prompts simultaneously and outputting results based on the image data and the prompts. For instance, the Large Multimodal Model may utilize natural language processing (NLP) techniques for text and computer vision (CV) techniques for images.

[0120] According to one embodiment, the electronic device (100) can identify information about an input field included in a second screen according to a second user command. According to one example, the information about the input field may include information about whether text is supported, whether rich text is supported, and whether images are supported. For example, rich text is formatted text, and unlike plain text, it may include various styles and formatting such as at least one of font, color, size, boldness, italics, and underline.

[0121] According to one example, the electronic device (100) can obtain paste content based on structured text data, context information of the second screen, and information about the input field of the second screen. For example, the electronic device (100) can obtain paste content by inputting the prompt obtained based on the structured text data, context information of the second screen, and information about the input field into the first artificial intelligence model.

[0122] In operation 370, the electronic device (100) according to one embodiment can input paste content into an input field included in the second screen.

[0123] According to one embodiment, the electronic device (100) can identify a plurality of paste contents that can be entered in an input field by inputting a prompt obtained based on structured text data, context information of a second screen, and information about an input field of a second screen into a first artificial intelligence model. For example, the electronic device (100) can input at least one of the plurality of paste contents into the input field.

[0124] According to one embodiment, the electronic device (100) can identify a plurality of paste types based on information about an input field and identify a plurality of paste contents by inputting structured text data, context information of a second screen, and a prompt obtained based on the plurality of paste types into a first artificial intelligence model. For example, the electronic device (100) can input at least one of the plurality of paste contents into an input field.

[0125] According to one embodiment, if a plurality of paste types are identified based on information about an input field, the electronic device (100) can provide a user interface (UI) containing information about a plurality of paste types to a second screen.

[0126] According to one embodiment, when a plurality of paste types are identified based on information about an input field, the electronic device (100) may provide a UI on a second screen that includes information about the plurality of paste types and a preview area for the plurality of paste types.

[0127] According to one example, when one of a plurality of paste types is selected through a UI, the electronic device (100) can input paste content corresponding to the selected paste type into an input field.

[0128] According to one example, information regarding multiple paste types may include at least one of editing information, summary information, and original information for at least one of the text and image included in the captured image.

[0129] According to one example, information on multiple paste types may include at least one of a contextual edit menu, a simple edit menu, a text summary menu, an image and text summary menu, and an original text menu.

[0130] According to one example, when the electronic device (100) selects the contextual edit menu, it can input a first paste content containing an answer that maintains the context of the content provided on the second screen into an input field.

[0131] According to one example, when the simple edit menu is selected, the electronic device (100) can input a second paste content containing a simple answer to the content provided on the second screen into the input field.

[0132] According to one example, when the text summary menu is selected, the electronic device (100) can input a third paste content containing a summary of text included in a captured image into an input field.

[0133] According to one example, when the image and text summary menu is selected, the electronic device (100) can input a fourth paste content, including a summary of the image included in the captured image and the text included in the captured image, into an input field.

[0134] According to one example, when the original text menu is selected, the electronic device (100) can input a fifth paste content containing the original text included in the captured image into an input field.

[0135] According to one embodiment, the electronic device (100) can obtain structured text data corresponding to the selected capture image from the memory (120) when the capture image stored in the memory (120) is selected after the paste content is entered into an input field included in the first application screen. For example, after content pasting based on the capture image is performed on the first application screen, the structured text data obtained during the content pasting process may be already stored in the memory (120). According to one example, since the electronic device (100) can obtain the structured text data already stored from the memory (120), there is no need to re-obtain the structured text from the capture image. According to one example, the electronic device (100) can obtain the paste content based on the structured text data obtained from the memory (120) and the context information of the second application screen. According to one example, the electronic device (100) can input the obtained paste content into an input field included in the second application screen. In one example, the first application and the second application may be different or the same application. In one example, the electronic device (100) may provide a captured image (or a preview of the captured image) stored in memory (120) on a UI screen (e.g., the execution screen of a gallery app) and receive a user command to select the captured image through the UI screen.

[0136] According to one embodiment, when a plurality of captured images are acquired, the electronic device (100) can identify duplicate and / or related captured images among the plurality of captured images and identify and display different information in the paste content acquired from each of the duplicate and / or related captured images. For example, when the electronic device (100) identifies that a first captured image and a second captured image are duplicate and / or related captured images, it can identify and display different information among the paste content corresponding to the first captured image and the paste content corresponding to the second captured image. For example, the electronic device (100) can identify and display different information using at least one of an indicator, a different color, a different font type, and a different font size.

[0137] According to one embodiment, when a plurality of captured images are acquired, the electronic device (100) can combine the paste content acquired from each of the duplicate and / or related captured images among the plurality of captured images into a single piece of information and display it. For example, when the electronic device (100) identifies that a first captured image and a second captured image are duplicate and / or related captured images, it can acquire and display an integrated paste content that combines the paste content corresponding to the first captured image and the paste content corresponding to the second captured image. For example, the electronic device (100) can acquire an integrated paste content in which the duplicate information is included only once.

[0138] According to one embodiment, the electronic device (100) can reconstruct and provide information related to the multiple captured images and / or the multiple captured images based on the context when viewing the multiple captured images, while the multiple captured images are acquired and stored.

[0139] According to one example, an electronic device (100) may prioritize displaying information related to a captured image belonging to a category corresponding to the context among the multiple captured images, based on the context when viewing the multiple captured images, while a plurality of captured images are acquired and stored. For example, the electronic device (100) may prioritize displaying information related to a captured image belonging to a corresponding category based on structured text corresponding to the multiple captured images.

[0140] According to one example, an electronic device (100) can display information about a captured image belonging to a category corresponding to the context among a plurality of captured images in relatively detail based on the context when viewing a plurality of captured images, while a plurality of captured images are acquired and stored (or scrapped). For example, the electronic device (100) can display information related to a captured image belonging to a corresponding category in detail based on structured text corresponding to a plurality of captured images.

[0141] FIG. 4 is a diagram illustrating a generative artificial intelligence model according to one embodiment.

[0142] According to one embodiment, an artificial intelligence model that distinguishes conditions and actions in natural language sentences can be implemented as a Generative AI Model (405) as shown in FIG. 4.

[0143] According to one embodiment, at least one of the first artificial intelligence model and the second artificial intelligence model may be implemented as a Generative AI Model (405) as shown in FIG. 4.

[0144] According to FIG. 4, the User Query / Response Interface (401) can receive user input. The user input may be in the form of natural language, images and / or videos.

[0145] For example, user input may include user voice received through a microphone. However, it is not limited thereto, and user input may include text corresponding to the voice generated by a speech-to-text (STT) model in addition to voice. Furthermore, context information may be transmitted along with the user input. Context information may include various additional information at the time of user input. For example, this may include information about the application currently being used by the user or the user's location information. Additionally, user input may take the form of a mixture of the aforementioned natural language, images, sounds, and context information. Furthermore, user input may also take the form of non-natural language input, such as menu selection.

[0146] The User Query / Response Interface (401) can output results from the Generative AI system to the user. The output can be in the form of natural language or specific content, and it can also be provided in the form of an action requested by the user. The User query interface (401) can output results from the Generative AI system to the user. The output can be in the form of natural language or specific content, and it can also be provided in the form of an action requested by the user. For example, the User Query / Response Interface (401) can output content generated by the Generative AI Model (405) based on voice received from the user.

[0147] The AI ​​framework (402) receives user input and can coordinate and control each component necessary to perform the user's intent based on the user's query.

[0148] User input received from the User Query / Response Interface (401) can be sent to the Prompt design component (402-1). The Prompt design component (402-1) can be used to generate a prompt suitable for inputting user input into a large language model (LLM) or a large multimodal model (LMM). The Prompt design component (402-1) may be an AI component that uses machine learning algorithms or neural networks to develop better prompts over time. The Prompt design component (402-1) can generate a prompt by accessing knowledge repositories (403) containing user preference data, prompt libraries, and prompt examples based on user input, and can send the generated prompt to the LLM or LMM.

[0149] The API / Plug-in management component (402-2) can perform the role of communicating with external information when there is a request for additional information when passing user input as input to a generative model. The API / Plug-in management component (402-2) establishes a channel to communicate with the outside of the AI ​​Interface via the API, and can enable access to various data sources through the established channel. Additionally, the API / Plug-in management component (402-2) can request an action via the API if the application or service needs to perform an action that executes the user input as a final step, rather than an intermediate result. Information obtained from the outside (e.g., the Applications / service component (404)) may be used to generate a prompt in the Prompt design component (402-1) along with the user input, or it may be passed as input to the generative model.

[0150] The Output modification Component (402-3) (or Refiner component) can fine-tune the output of the generative model. For example, the Output modification Component (402-3) can verify whether the content generated through the LLM and / or LMM is irrelevant, contains biased content, or contains harmful content. Additionally, the Output modification Component (402-3) can determine the extent to which the output matches the desired result and, if additional processing is required, proceed with that process. Furthermore, the Output modification Component (402-3) can configure and provide hints to the user to avoid unwanted output.

[0151] A Generative AI Model (405) generally refers to an artificial intelligence neural network that generates new forms of data based on user input information. A Generative AI Model (405) may include models that generate images and / or models that generate language. Models that generate images include, but are not limited to, GANs (generative adversarial networks) and VAEs (variational autoencoders), and examples of Diffusion-based generative models using VAEs and Transformer structures. Models that generate language are models trained to output the most statistically appropriate output value based on input values, and examples of which include CHAT-GPT 3 and CHAT-GPT 4 (e.g., CHAT-GPT 4o). There are also LMMs (large multimodal models) that can recognize various forms of data input, such as text, images, and voice, and generate new data corresponding to them.

[0152] According to an embodiment, when a prompt is input from the Prompt design component (402-1), the Generative AI Model (405) generates content corresponding to the prompt based on the instructions and can output the content through the User Query / Response Interface (401).

[0153] FIGS. 5a to 5d are drawings for explaining examples of a method for obtaining a captured image and additional information according to one embodiment.

[0154] FIG. 5a is a diagram illustrating an example of a first user command for screen capture according to one example.

[0155] According to one example illustrated in FIG. 5a, a plurality of buttons (521, 522, 523) may be provided on the side of the electronic device (100). For example, the plurality of buttons (521, 522, 523) may include a volume up button (521), a volume down button (522), and a power button (523). For example, the electronic device (100) may identify that a first user command for capturing the current screen (510) has been entered when the volume up button (521) and the power button (523) among the plurality of buttons (521, 522, 523) are pressed simultaneously.

[0156] FIGS. 5b and FIGS. 5c are drawings for illustrating an example of a first user command for screen capture according to one example.

[0157] According to one example illustrated in FIG. 5b, the electronic device (310) may display a UI (531) including a save button tab (531) for screen capture. For example, the UI (531) may be a UI for clipboard functions. For example, the electronic device (100) may identify that a first user command to capture the current screen (510) has been entered when the save button tab (531) is selected by a user's touch input.

[0158] According to one example illustrated in FIG. 5c, when the option to save a screenshot to the clipboard (541) is enabled (or turned on) in the settings screen (540), a save button tab (531) may be provided in the UI (531) for the clipboard function as illustrated in FIG. 5b.

[0159] FIG. 5d is a drawing for explaining an example of a capture image obtained according to a first user command according to one example. For example, if the current screen (510) is a web browser screen containing URL information, a capture image including URL information (551) and screen content (552) can be obtained according to the first user command as shown in FIG. 5d.

[0160] FIG. 6 is a drawing for explaining an example of information obtained according to a first user command according to one embodiment.

[0161] According to one embodiment, the electronic device (100) can obtain a screen capture image, additional information related to the screen, and text included in the capture image in accordance with a first user command to capture the current screen.

[0162] According to one example, the electronic device (100) can obtain an app (package), URL information, date and time information, a captured image (or screenshot image), and text contained in the captured image corresponding to the current screen (610) as illustrated in FIG. 6, in accordance with a first user command. For example, the electronic device (100) can extract text from a page included in the first screen or extract text from a captured image of the first screen. For example, the electronic device (100) can extract text from a page included in the first screen using a programming language and / or library. For example, the electronic device (100) can extract text from a captured image using an OCR engine.

[0163] FIG. 7 is a drawing for illustrating an example of structured text data according to one embodiment.

[0164] According to one embodiment, the electronic device (100) can convert text included in a captured image into structured text data based on additional information related to a first screen. According to one example, the additional information related to the first screen may include at least one of application information, URL information, timestamp information, device information, location information, captured image attribute information, and EXIF ​​data. According to one example, the electronic device (100) can analyze text characteristics included in a captured image based on additional information related to the first screen, identify structured format information based on the text characteristics, and convert text included in the captured image into structured text data based on the structured format information.

[0165] According to one example, the electronic device (100) may analyze the characteristics of the text included in the captured image based on the application type corresponding to the first screen as shown in FIG. 7, determine that it is appropriate to structure it into a JSON format, and generate a prompt for obtaining text data structured in a JSON format. For example, the electronic device (100) may obtain a prompt (710) including a command such as “The following test is extracted from a shopping app. Please convert it into JSON format to make it versatile and suitable for various uses” and text included in the captured image (“SAMSUNG 14” Galaxy Book 4 Pro .... ; Intel Core Ultra 7 processor 155H) as shown in FIG. 7.

[0166] According to one example, the electronic device (100) can query a second artificial intelligence model with the generated prompt (710) to obtain text data (720) structured in JSON format.

[0167] FIG. 8 is a flowchart illustrating a method for obtaining paste content according to one embodiment.

[0168] In the following embodiments, each operation may be performed sequentially, but is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

[0169] According to one embodiment, operations 810 to 870 may be understood to be performed in the processor (110) of the electronic device (100). According to one example, operations 810 to 870 may be operations according to one example of operations 350 and operations 360 shown in FIG. 3. Accordingly, operations 810 to 870 may be operations performed on a second screen provided to a display (130) after the captured image of the first screen, additional information of the first screen, text included in the captured image, and structured text data are acquired and stored in memory (120).

[0170] According to FIG. 8, in operation 810, when a paste command is identified while the second screen is provided on the display (130), the electronic device (100) according to one embodiment can identify whether the paste command is an AI paste in operation 820. For example, the AI ​​paste may be a function for inputting paste content generated according to one embodiment of the present disclosure into an input field included in the second screen. For example, since an artificial intelligence model is used in one embodiment of the present disclosure, it will be referred to as "AI paste" for convenience of explanation.

[0171] If the paste command is not an AI paste (820:N), in operation 830, the electronic device (100) according to one embodiment can paste the captured image of the first screen, which is an image stored in memory (120), directly into the input field.

[0172] When the paste command is AI paste (830:Y), in operation 840, the electronic device (100) according to one embodiment can analyze the input field of the second screen, which is the current screen, to obtain information about the input field of the second screen. According to one example, the information about the input field may include information about at least one of whether text is supported, whether rich text is supported, and whether images are supported.

[0173] In operation 850, an electronic device (100) according to one embodiment can obtain context information of the second screen by analyzing the context of the second screen, which is the current screen. According to one example, the context information of the second screen may include at least one of the interface type of the second screen, surrounding environment information of the electronic device (100), usage history information of the electronic device (100), and user profile information.

[0174] In operation 860, an electronic device (100) according to one embodiment can generate a prompt based on structured text data obtained based on a first screen, context information of a second screen, and information about an input field of a second screen, and query the generated prompt to a first artificial intelligence model.

[0175] In operation 870, the electronic device (100) according to one embodiment may create a content pool according to a target app. For example, the target app may be an application corresponding to a second screen. For example, the content pool may be a set of contents corresponding to each of a plurality of paste types. For example, a plurality of paste types may be identified based on information about an input field. According to one example, a plurality of paste types may include at least one of an edit paste type, a summary paste type, and an original paste type.

[0176] An electronic device (100) according to one embodiment can identify paste content corresponding to the selected paste type from a content pool and paste it onto a second screen when a paste type is selected according to a user command.

[0177] FIGS. 9a and 9b are drawings for explaining a paste operation according to one embodiment.

[0178] According to FIG. 9a, when a user command for pasting is entered on a second screen (910), the electronic device (100) may provide a UI (911) for selecting paste options. For example, the second screen (910) may be a note application execution screen. For example, the user command for pasting may be a long press input, but is not limited thereto. For example, the UI (911) for selecting paste options may include a "Paste with AI" option and a "Paste as an image" option.

[0179] When the "Paste as an image" option is selected in the UI (911) shown in FIG. 9a, the electronic device (100) can paste the captured image (910-1) of the first screen, which is an image stored in memory (120), directly onto the second screen (910), which is the note application execution screen, as shown in FIG. 9b.

[0180] FIG. 10 is a diagram illustrating a method for analyzing an input field of a second screen according to one embodiment.

[0181] According to one embodiment, the electronic device (100) can identify the type of input field included in the second screen based on the input field information of the second screen. For example, the input field information may be "EditorInfo" used in Android development, but is not limited thereto.

[0182] According to one example, the electronic device (100) can analyze an input field based on "EditorInfo," which is an example of input field information for a second screen. For example, as illustrated in FIG. 10, the electronic device (100) can identify the type of input data (e.g., inputType) that can be entered into the second screen based on "EditorInfo." For example, the type of input data that can be entered may include at least one of text, rich text, and an image.

[0183] FIGS. 11a to 11c are drawings for explaining a method of context analysis of a second screen according to one embodiment.

[0184] According to one example, the context information of the second screen may include at least one of information displayed on the screen, surrounding environment information of the electronic device (100), usage history information of the electronic device (100), and user profile information. For example, the electronic device (100) may obtain information displayed on the screen in different ways depending on the interface type of the second screen. For example, the interface type of the second screen may include at least one of an interactive interface, a document creation interface, and a simple form interface.

[0185] According to FIG. 11a, if the interface of the second screen is an interactive interface (1110), the electronic device (100) can acquire information (1111) displayed on the screen in different ways depending on whether the conversation text is readable. For example, if the conversation text is readable, the electronic device (100) can read and store it as is, and if the conversation text is not readable, it can extract and store the conversation using OCR.

[0186] According to FIG. 11b, if the interface of the second screen is a document creation type interface (1120), the electronic device (100) can obtain information (1121) displayed on the screen in a different way depending on whether the text immediately preceding the cursor position is readable. For example, if the text immediately preceding the cursor position is readable, the electronic device (100) reads and saves it as is, and if the text immediately preceding the cursor position is not readable, it can extract and save the conversation using OCR.

[0187] According to FIG. 11c, if the interface of the second screen is a simple form interface (1130), the electronic device (100) can analyze the input type and obtain information (1131, 1132) displayed on the screen. For example, if the electronic device (100) cannot identify the input type, it can use OCR to analyze the input type.

[0188] FIGS. 12a to 12c are drawings for explaining a method of obtaining paste content according to one embodiment.

[0189] According to one embodiment, the electronic device (100) can obtain paste content based on structured text data corresponding to a second screen and context information of the second screen. For example, as shown in FIG. 12a, the electronic device (100) can obtain paste content by generating a prompt requesting a response based on structured text data corresponding to a first screen and context information of the second screen, and inputting the generated prompt into a first artificial intelligence model.

[0190] According to one embodiment, the electronic device (100) can obtain paste content based on structured text data corresponding to a second screen, context information of the second screen, and information about the input field of the second screen. For example, as shown in FIG. 12b, the electronic device (100) can generate a prompt requesting a response based on structured text data corresponding to a first screen, context information of the second screen, and information about the input field, and input the generated prompt into a first artificial intelligence model to obtain paste content corresponding to the attributes of the input field.

[0191] According to one embodiment, the electronic device (100) can obtain paste content based on structured text data corresponding to a second screen, context information of the second screen, and information about the paste type. For example, as shown in FIG. 12c, the electronic device (100) can generate a prompt requesting a response based on structured text data corresponding to a first screen, context information of the second screen, and information about the paste type (e.g., rich text), and input the generated prompt into a first artificial intelligence model to obtain paste content of a specific type (e.g., rich text).

[0192] FIGS. 13a to 13e are drawings for explaining examples of paste content according to different paste types according to one embodiment.

[0193] According to one embodiment, the electronic device (100) can obtain different types of paste content depending on a plurality of paste types. For example, the plurality of paste types may include at least one of a simple edit, a contextual edit, a text summary, an image and text summary, and original text.

[0194] According to one example, if the context information includes information for inquiring about a price and the paste type is a simple edit type, a paste content containing only price information can be obtained as illustrated in FIG. 13a.

[0195] According to one example, if the context information includes information inquiring about a price and the paste type is a contextual edit type, a paste content that maintains the context of the content while including the price information can be obtained as illustrated in FIG. 13b.

[0196] According to one example, if the context information includes information inquiring about a price and the paste type is a text summary, a paste content in the form of a text summary containing price information can be obtained as illustrated in FIG. 13c.

[0197] According to one example, if the context information includes information for inquiring about a price and the paste type is an image and text summary type, paste content including an image as well as a text summary containing price information can be obtained as shown in FIG. 13d.

[0198] According to one example, if the context information includes information inquiring about a price and the paste type is the original text type, paste content in the form of original text containing price information can be obtained as illustrated in FIG. 13e. For example, the original text may be the original text included in the captured image of the first screen.

[0199] FIGS. 14a to 14c are drawings for explaining examples of UI screens according to one embodiment.

[0200] According to FIG. 14a, when a user command for pasting is entered on a second screen (910), the electronic device (100) may provide a UI (911) for selecting paste options. For example, the second screen (910) may be a note application execution screen. For example, the user command for pasting may be a long press input, but is not limited thereto. For example, the UI (911) for selecting paste options may include a "Paste with AI" option and a "Paste as an image" option.

[0201] According to one example, when the "Paste with AI" option is selected in the UI (911) illustrated in FIG. 14a, the electronic device (100) may provide a UI (920) for selecting a paste type on a second screen (910) as illustrated in FIG. 14b. For example, the UI (920) for selecting a paste type may include an area (921) containing the paste type and a preview area (922). For example, the area (921) containing the paste type may include a Paste menu (921-1), a Cancel menu (921-2), a Text summary with image menu (921-3), a Text Summary menu (921-4), and an Original text menu (921-5). For example, the preview area (922) may provide a preview image for the currently selected paste type (e.g., Text summary with image).

[0202] According to one example, when the Text summary with image menu (921-3) is selected in the UI (920) illustrated in FIG. 14b and the Paste menu (921-1) is selected, the electronic device (100) can insert a text summary and an image (941) into the second screen (910) as illustrated in FIG. 14c. For example, the electronic device (100) can input a text summary and an image (931) into the second screen (910) according to a final paste command (e.g., touching the left area of ​​the ancel menu (921-2)). It is not limited to this, and the text summary and an image (931) can be input into the second screen (910) even without a final paste command. For example, the electronic device (100) can cancel the pasting of the text summary and an image (931) pasted into the second screen (910) when the Cancel button (932) is selected.

[0203] FIGS. 15a to 15d are drawings for explaining examples of UI screens according to one embodiment.

[0204] According to FIG. 15a, when a user command for pasting is entered on a second screen (910), the electronic device (100) may provide a UI (911) for selecting paste options. For example, the second screen (910) may be a note application execution screen. For example, the user command for pasting may be a long press input, but is not limited thereto. For example, the UI (911) for selecting paste options may include a "Paste with AI" option and a "Paste as an image" option.

[0205] According to one example, when the "Paste with AI" option is selected in the UI (911) illustrated in FIG. 15a, the electronic device (100) may provide a UI (920) for selecting a paste type on a second screen (910) as illustrated in FIG. 15b. For example, the UI (920) for selecting a paste type may include an area (921) containing the paste type and a preview area (922). For example, the area (921) containing the paste type may include a Paste menu (921-1), a Cancel menu (921-2), a Text summary with image menu (921-3), a Text summary menu (921-4), and an Original text menu (921-5). For example, the preview area (922) may provide a preview image for the currently selected paste type (e.g., Text summary).

[0206] According to one example, when the Text summary menu (921-4) is selected in the UI (930) shown in FIG. 15b and the Paste menu (921-1) is selected, the electronic device (100) can paste the text summary (941) onto the second screen (910) as shown in FIG. 15c. For example, when the Cancel button (952) is selected, the electronic device (100) can cancel the pasting of the text summary (941) as shown in FIG. 15d.

[0207] FIGS. 16a to 16e are drawings for explaining examples of UI screens according to one embodiment.

[0208] According to FIG. 16a, the electronic device (100) may provide a UI (1611) for selecting paste options when a user command for pasting is entered on a second screen (1610). For example, the second screen (1610) may be a messenger application execution screen where conversation content (1612) can be entered. The UI (1611) for selecting paste options may include a "Paste with AI" option and a "Paste as an image" option.

[0209] According to one example, when the "Paste with AI" option is selected in the UI (1611) illustrated in FIG. 16a, the electronic device (100) may provide a UI (1620) for selecting a paste type on a second screen (1610) as illustrated in FIG. 16b. For example, the UI (1620) for selecting a paste type may include an area (1621) containing the paste type and a preview area (1622). For example, the area (1621) containing the paste type may include a Paste menu (1621-1), a Cancel menu (1621-2), a Contextual edit menu (1621-3), a Simple edit menu (1621-4), a Text Summary menu (1621-5), and an Original text menu (1621-6). For example, the preview area (1622) can provide a preview image for the currently selected paste type (e.g., Contextual edit).

[0210] According to one example, when the Paste menu (1621-1) is selected while the Contextual edit menu (1621-3) is selected in the UI (1620) shown in FIG. 16b, the electronic device (100) can input content corresponding to the Contextual edit menu (1621-3) into the input window (1613) included in the second screen (1610), as shown in FIG. 16c. For example, as shown in FIG. 16d, the content entered into the input window (1613) can be transmitted to a messenger application according to the user's transmission command and displayed as conversation content (1614) on the second screen (1610).

[0211] According to one example, as illustrated in FIG. 16e, when the Simple eidt menu (1621-4) is selected in the UI (1620) for selecting a paste type, the preview area (1622) can provide a preview image of the currently selected paste type (e.g., Simple eidt).

[0212] FIGS. 17a and FIGS. 17b are drawings for explaining an example of a UI screen according to one embodiment.

[0213] According to FIG. 17a, the electronic device (100) may provide a UI (1711) for selecting paste options when a user command for pasting is entered on a second screen (1710). For example, the second screen (1710) may be a messenger application execution screen where conversation content (1712) can be entered. The UI (1711) for selecting paste options may include a "Paste with AI" option and a "Paste as an image" option.

[0214] According to one example, when the "Paste with AI" option is selected in the UI (1711) illustrated in FIG. 17a, the electronic device (100) may provide a UI (1720) for selecting a paste type on a second screen (1710) as illustrated in FIG. 16b. For example, the UI (1720) for selecting a paste type may include an area (1721) containing the paste type and a preview area (1722). For example, the area (1721) containing the paste type may include a Paste menu (1721-1), a Cancel menu (1721-2), a Contextual edit menu (1721-3), a Simple edit menu (1671-4), a Text Summary menu (1721-5), and an Original text menu (1721-6). For example, the preview area (1722) can provide a preview image for the currently selected paste type (e.g., Contextual edit).

[0215] According to one example, the area (1721) containing the paste type may only include a menu corresponding to the paste type corresponding to the characteristics of the application (e.g., messenger application) corresponding to the second screen (1710). For example, if simultaneous input of text and image is not supported in the messenger application, the Text summary with image menu may not be included in the UI (1720) for selecting the paste type. According to one example, when analyzing the input field of the second screen in the embodiment illustrated in FIG. 8 to generate a content pool containing paste type content for the target application, the content of the Text summary with image type may not be included in the generated content pool.

[0216] FIG. 18 is a diagram illustrating an example of performing a paste operation using a previously saved captured image according to one embodiment.

[0217] According to one embodiment, when the electronic device (100) stores a screen capture image used for a paste operation, it may store not only the screen capture image but also additional information corresponding to the screen and structured text data mapped to the screen capture image.

[0218] According to one example, when a specific capture image is selected through a UI (1810) that provides a previously stored capture image as illustrated in FIG. 18, the electronic device (100) can obtain at least one of additional information and structured text data corresponding to the selected capture image based on previously stored information. For example, when a paste command for the selected capture image is input on the currently running application screen, the electronic device (100) can perform a paste operation using the previously stored structured text data corresponding to the selected capture image. Accordingly, when the same capture image is used for pasting, the process operation can be simplified by using the previously stored structured text data.

[0219] FIG. 19 is a block diagram of an electronic device in a network environment according to various embodiments.

[0220] According to one example, the electronic device (2401) can be implemented as the electronic device (100) shown in FIG. 2.

[0221] Referring to FIG. 19, in a network environment (2400), an electronic device (2401) may communicate with an electronic device (2402) through a first network (2498) (e.g., a short-range wireless communication network) or with at least one of an electronic device (2404) or a server (2408) through a second network (2499) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (2401) may communicate with the electronic device (2404) through a server (2408). According to one embodiment, the electronic device (2401) may include a processor (2420), memory (2430), input module (2450), sound output module (2455), display module (2460), audio module (2470), sensor module (2476), interface (2477), connection terminal (2478), haptic module (2479), camera module (2480), power management module (2488), battery (2489), communication module (2490), subscriber identification module (2496), or antenna module (2497). In some embodiments, at least one of these components (e.g., connection terminal (2478)) may be omitted from the electronic device (2401), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (2476), camera module (2480), or antenna module (2497)) may be integrated into a single component (e.g., display module (2460)).

[0222] The processor (2420) can, for example, execute software (e.g., program (2440)) to control at least one other component (e.g., hardware or software component) of the electronic device (2401) connected to the processor (2420) and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (2420) can store commands or data received from other components (e.g., sensor module (2476) or communication module (2490)) in volatile memory (2432), process the commands or data stored in volatile memory (2432), and store the resulting data in non-volatile memory (2434). According to one embodiment, the processor (2420) may include a main processor (2421) (e.g., a central processing unit or an application processor) or an auxiliary processor (2423) that can operate independently or together with it (e.g., a graphics processing unit, a neural processing unit (NPU), an image signal processor, a sensor hub processor, or a communication processor). For example, if the electronic device (2401) includes a main processor (2421) and an auxiliary processor (2423), the auxiliary processor (2423) may be configured to use less power than the main processor (2421) or to be specialized for a specified function. The auxiliary processor (2423) may be implemented separately from the main processor (2421) or as part thereof.

[0223] The auxiliary processor (2423) may control at least some of the functions or states associated with at least one component of the electronic device (2401) (e.g., display module (2460), sensor module (2476), or communication module (2490)) on behalf of the main processor (2421) while the main processor (2421) is in an inactive (e.g., sleep) state, or together with the main processor (2421) while the main processor (2421) is in an active (e.g., application execution) state. According to one embodiment, the auxiliary processor (2423) (e.g., image signal processor or communication processor) may be implemented as part of another functionally related component (e.g., camera module (2480) or communication module (2490)). According to one embodiment, the auxiliary processor (2423) (e.g., neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may be performed, for example, on the electronic device (2401) itself where the artificial intelligence model is executed, or through a separate server (e.g., server (2408)). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but is not limited to the examples described above. The artificial intelligence model may include a plurality of artificial neural network layers.An artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), a deep Q-network, or a combination of two or more of the above, but is not limited to the examples described above. In addition to the hardware structure, an artificial intelligence model may include a software structure, either additionally or substantially.

[0224] The memory (2430) can store various data used by at least one component of the electronic device (2401) (e.g., processor (2420) or sensor module (2476)). The data may include, for example, input data or output data for software (e.g., program (2440)) and related commands. The memory (2430) may include volatile memory (2432) or non-volatile memory (2434).

[0225] The program (2440) may be stored as software in memory (2430) and may include, for example, an operating system (1442), middleware (1444), or an application (1446).

[0226] The input module (2450) can receive commands or data to be used for a component of the electronic device (2401) (e.g., processor (2420)) from outside the electronic device (2401) (e.g., user). The input module (2450) may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

[0227] The sound output module (2455) can output a sound signal to the outside of the electronic device (2401). The sound output module (2455) may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as multimedia playback or recording playback. The receiver may be used to receive incoming calls. According to one embodiment, the receiver may be implemented separately from the speaker or as part thereof.

[0228] The display module (2460) can visually provide information to an external (e.g., user) of the electronic device (2401). The display module (2460) may include, for example, a display, a holographic device, or a projector and a control circuit for controlling said device. According to one embodiment, the display module (2460) may include a touch sensor configured to detect a touch, or a pressure sensor configured to measure the intensity of the force generated by said touch.

[0229] The audio module (2470) can convert sound into an electrical signal or, conversely, convert an electrical signal into sound. According to one embodiment, the audio module (2470) can acquire sound through the input module (2450) or output sound through the sound output module (2455) or an external electronic device (e.g., electronic device (2402)) (e.g., speaker or headphones) connected directly or wirelessly to the electronic device (2401).

[0230] The sensor module (2476) can detect the operating state of the electronic device (2401) (e.g., power or temperature) or the external environmental state (e.g., user state) and generate an electrical signal or data value corresponding to the detected state. According to one embodiment, the sensor module (2476) may include, for example, a gesture sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an accelerometer sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biosensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

[0231] The interface (2477) may support one or more specified protocols that can be used for the electronic device (2401) to be connected directly or wirelessly to an external electronic device (e.g., electronic device (2402)). According to one embodiment, the interface (2477) may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

[0232] The connection terminal (2478) may include a connector through which the electronic device (2401) can be physically connected to an external electronic device (e.g., electronic device (2402)). According to one embodiment, the connection terminal (2478) may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).

[0233] The haptic module (2479) can convert an electrical signal into a mechanical stimulus (e.g., vibration or movement) or an electrical stimulus that can be perceived by the user through tactile or kinesthetic senses. According to one embodiment, the haptic module (2479) may include, for example, a motor, a piezoelectric element, or an electric stimulation device.

[0234] The camera module (2480) can capture still images and video. According to one embodiment, the camera module (2480) may include one or more lenses, image sensors, image signal processors, or flashes.

[0235] The power management module (2488) can manage power supplied to the electronic device (2401). According to one embodiment, the power management module (2488) can be implemented, for example, as at least part of a power management integrated circuit (PMIC).

[0236] The battery (2489) can supply power to at least one component of the electronic device (2401). According to one embodiment, the battery (2489) may include, for example, a non-rechargeable primary battery, a rechargeable secondary battery, or a fuel cell.

[0237] The communication module (2490) can support the establishment of a direct (e.g., wired) communication channel or a wireless communication channel between an electronic device (2401) and an external electronic device (e.g., electronic device (2402), electronic device (2404), or server (2408)), and the performance of communication through the established communication channel. The communication module (2490) may include one or more communication processors that operate independently of the processor (2420) (e.g., application processor) and support direct (e.g., wired) communication or wireless communication. According to one embodiment, the communication module (2490) may include a wireless communication module (2492) (e.g., cellular communication module, short-range wireless communication module, or GNSS (global navigation satellite system) communication module) or a wired communication module (1494) (e.g., LAN (local area network) communication module, or power line communication module). Among these communication modules, the communication module described above can communicate with an external electronic device (2404) through a first network (2498) (e.g., a short-range communication network such as Bluetooth, WiFi (wireless fidelity) direct, or IrDA (infrared data association)) or a second network (2499) (e.g., a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., a LAN or WAN). These various types of communication modules may be integrated into a single component (e.g., a single chip) or implemented as multiple separate components (e.g., multiple chips). The wireless communication module (2492) can identify or authenticate the electronic device (2401) within a communication network such as the first network (2498) or the second network (2499) using subscriber information (e.g., International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module (2496).

[0238] The wireless communication module (2492) can support 5G networks and next-generation communication technologies following 4G networks, for example, new radio access technology. NR access technology can support high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and connection of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low-latency communications (URLLC)). The wireless communication module (2492) can support a high-frequency band (e.g., mmWave band) to achieve a high data transmission rate, for example. The wireless communication module (2492) can support various technologies for securing performance in the high-frequency band, such as beamforming, massive MIMO (multiple-input and multiple-output), full-dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large-scale antenna. The wireless communication module (2492) can support various requirements specified in the electronic device (2401), external electronic device (e.g., electronic device (2404)), or network system (e.g., second network (2499)). According to one embodiment, the wireless communication module (2492) can support a Peak data rate (e.g., 20 Gbps or more) for realizing eMBB, loss coverage (e.g., 164 dB or less) for realizing mMTC, or U-plane latency (e.g., downlink (DL) and uplink (UL) each 0.5 ms or less, or round trip 1 ms or less) for realizing URLLC.

[0239] An antenna module (2497) can transmit a signal or power to or from an external source (e.g., an external electronic device). According to one embodiment, the antenna module (2497) may include an antenna comprising a radiator made of a conductor or a conductive pattern formed on a substrate (e.g., a PCB). According to one embodiment, the antenna module (2497) may include a plurality of antennas (e.g., an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network, such as a first network (2498) or a second network (2499), may be selected from the plurality of antennas, for example, by a communication module (2490). A signal or power may be transmitted or received between the communication module (2490) and an external electronic device through the selected at least one antenna. According to some embodiments, in addition to the radiator, other components (e.g., a radio frequency integrated circuit (RFIC)) may be additionally formed as part of the antenna module (2497).

[0240] According to various embodiments, the antenna module (2497) may form a mmWave antenna module. According to one embodiment, the mmWave antenna module may include a printed circuit board, an RFIC disposed on or adjacent to a first surface (e.g., bottom surface) of the printed circuit board and capable of supporting a specified high frequency band (e.g., mmWave band), and a plurality of antennas (e.g., array antennas) disposed on or adjacent to a second surface (e.g., top surface or side surface) of the printed circuit board and capable of transmitting or receiving a signal of the specified high frequency band.

[0241] At least some of the above components can be connected to each other via a communication method between peripheral devices (e.g., bus, GPIO (general purpose input and output), SPI (serial peripheral interface), or MIPI (mobile industry processor interface)) and exchange signals (e.g., commands or data) with each other.

[0242] According to one embodiment, commands or data may be transmitted or received between the electronic device (2401) and an external electronic device (2404) through a server (2408) connected to a second network (2499). Each of the external electronic devices (2402, or 2404) may be the same or a different type of device as the electronic device (2401). According to one embodiment, all or part of the operations performed on the electronic device (2401) may be performed on one or more of the external electronic devices (2402, 2404, or 2408). For example, if the electronic device (2401) needs to perform a function or service automatically or in response to a request from a user or another device, the electronic device (2401) may request one or more external electronic devices to perform at least part of the function or service instead of performing the function or service itself or additionally. One or more external electronic devices that receive the above request may execute at least part of the requested function or service, or additional function or service related to the request, and transmit the result of the execution to the electronic device (2401). The electronic device (2401) may provide the result as is or additionally processed as at least part of the response to the request. For this purpose, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device (2401) may provide ultra-low latency services using, for example, distributed computing or mobile edge computing. In another embodiment, the external electronic device (2404) may include an Internet of Things (IoT) device. The server (2408) may be an intelligent server using machine learning and / or neural networks.According to one embodiment, an external electronic device (2404) or server (2408) may be included within the second network (2499). The electronic device (2401) may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology and IoT-related technology.

[0243] According to one embodiment, an electronic device (100) comprises a memory (120) for storing instructions; a display (130); and at least one processor (110) including processing circuitry; wherein, when the instructions are executed individually or collectively by the at least one processor, the electronic device obtains a captured image of the first screen and additional information related to the first screen when a first user command for capturing a first screen of the display is identified, and when text included in the captured image is identified, converts the identified text into structured text data based on the additional information, obtains paste content based on the structured text data and context information of a second screen of the display, and inputs the paste content into an input field included in the second screen.

[0244] According to one embodiment, additional information related to the first screen may include at least one of application information corresponding to the first screen and URL (uniform resource locator) information corresponding to the first screen.

[0245] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may convert the identified text into the structured text data based on at least one of the application information and the URL information corresponding to the first screen.

[0246] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may identify information about the input field included in the second screen according to the second user command and obtain the paste content based on the structured text data, the context information, and the information about the input field.

[0247] According to one embodiment, the information regarding the input field may include information on at least one of whether text is supported, whether rich text is supported, and whether images are supported.

[0248] According to one embodiment, the context information may include at least one of information displayed on the second screen, surrounding environment information of the electronic device, usage history information of the electronic device, and user profile information.

[0249] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device inputs a prompt obtained based on the structured text data, the context information, and information about the input field into a first artificial intelligence model to obtain a plurality of paste contents that can be entered in the input field, and inputs at least one of the plurality of paste contents into the input field.

[0250] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device identifies a plurality of paste types based on information regarding the input field, inputs the structured text data, the context information, and a prompt obtained based on the plurality of paste types into the first artificial intelligence model to obtain the plurality of paste contents, provides a user interface (UI) including information regarding the plurality of paste types to the second screen, and when at least one of the plurality of paste types is selected through the UI, the at least one paste content corresponding to the selected paste type can be input into the input field.

[0251] According to one embodiment, the plurality of paste types may include at least one of an edit type, a summary type, and an original type for at least one of the text and image included in the captured image.

[0252] According to one embodiment, information regarding the plurality of paste types included in the UI screen includes at least one of a contextual edit menu, a simple edit menu, a text summary menu, an image and text summary menu, and an original text menu, and when the instructions are executed individually or collectively by the at least one processor, the electronic device inputs a first paste content including an answer that maintains the context of the content provided on the second screen into the input field when the contextual edit menu is selected, inputs a second paste content including a simple answer to the content provided on the second screen into the input field when the simple edit menu is selected, inputs a third paste content including a summary of the text included in the captured image into the input field when the text summary menu is selected, and inputs a fourth paste content including an image included in the captured image and a summary of the text included in the captured image when the image and text summary menu is selected. When input is entered into the input field and the original text menu is selected, a fourth paste content containing the original text included in the captured image can be entered into the input field.

[0253] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device identifies a plurality of paste types based on information regarding the input field and provides a UI on the second screen that includes information regarding the plurality of paste types and a preview area for the plurality of paste types, and when at least one of the plurality of paste types is selected through the UI, the at least one paste content corresponding to the selected paste type can be input into the input field.

[0254] According to one embodiment, additional information related to the first screen includes application information corresponding to the first screen, and when the instructions are executed individually or collectively by the at least one processor, the electronic device converts the identified text into first structured text data based on a format corresponding to the first type application information if the application information includes first type application information, and converts the identified text into second structured text data based on a format corresponding to the second type application information if the application information includes second type application information different from the first type.

[0255] According to one embodiment, the format of the second structured text data may be different from the format of the second structured text data.

[0256] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device stores the captured image, the additional information, and the structured text data in the memory, and when the captured image stored in the memory is selected on the second application screen after the paste content is entered into an input field included in the first application screen, the paste content is obtained based on the structured text data stored in the memory and the context information of the second application screen, and the obtained paste content is entered into an input field included in the second application screen.

[0257] According to one embodiment, when the instructions are executed individually or collectively by the at least one processor, the electronic device may input a prompt obtained based on text included in the captured image and additional information related to the screen into a second artificial intelligence model to obtain the structured text data.

[0258] According to one embodiment, a control method for an electronic device comprises: an operation of obtaining a captured image of the first screen and additional information related to the first screen when a first user command for capturing a first screen is identified; an operation of converting the identified text into structured text data based on the additional information when text included in the captured image is identified; an operation of obtaining paste content based on the structured text data and context information of the second screen; and an operation of inputting the paste content into an input field included in the second screen.

[0259] According to one embodiment, additional information related to the first screen may include at least one of application information corresponding to the first screen and URL information corresponding to the first screen.

[0260] According to one embodiment, the operation of converting the identified text into the structured text data may include the operation of converting the identified text into the structured text data based on at least one of the application information and the URL information corresponding to the first screen.

[0261] According to one embodiment, the control method may further include an operation of identifying information about the input field included in the second screen according to a second user command.

[0262] According to one embodiment, the operation of obtaining the paste content may include the operation of obtaining the paste content based on the structured text data, the context information, and the information about the input field.

[0263] According to one embodiment, the information regarding the input field may include information on whether text is supported, whether rich text is supported, and whether images are supported.

[0264] According to one embodiment, the context information may include at least one of information displayed on the second screen, surrounding environment information of the electronic device, usage history information of the electronic device, and user profile information.

[0265] According to one embodiment, the operation of obtaining the paste content may include the operation of obtaining a plurality of paste contents that can be entered in the input field by inputting a prompt obtained based on the structured text data, the context information, and information about the input field into a first artificial intelligence model.

[0266] According to one embodiment, the operation of inputting the pasted content into an input field included in the second screen may include the operation of inputting at least one of the plurality of pasted contents into the input field.

[0267] According to one embodiment, the operation of obtaining the paste content may include: an operation of identifying a plurality of paste types based on information regarding the input field; and an operation of obtaining the plurality of paste contents by inputting a prompt obtained based on the structured text data, the context information, and the plurality of paste types into the first artificial intelligence model.

[0268] According to one embodiment, the operation of inputting the paste content into an input field included in the second screen may include: providing a user interface (UI) to the second screen that includes information on the plurality of paste types; and, when at least one of the plurality of paste types is selected through the UI, inputting the at least one paste content corresponding to the selected paste type into the input field.

[0269] According to one embodiment, the plurality of paste types may include at least one of an edit type, a summary type, and an original type for at least one of the text and image included in the captured image.

[0270] According to one embodiment, the operation of inputting the paste content into an input field included in the second screen may include: an operation of identifying a plurality of paste types based on information regarding the input field; an operation of providing a UI to the second screen that includes information regarding the plurality of paste types and a preview area for the plurality of paste types; and, when at least one of the plurality of paste types is selected through the UI, an operation of inputting the at least one paste content corresponding to the selected paste type into the input field.

[0271] According to one embodiment, additional information related to the first screen may include application information corresponding to the first screen.

[0272] According to one embodiment, the operation of converting the identified text into the structured text data may include: an operation of converting the identified text into the first structured text data based on a format corresponding to the first type application information if the application information includes the first type application information; and an operation of converting the identified text into the second structured text data based on a format corresponding to the second type application information if the application information includes the second type application information which is different from the first type.

[0273] According to one embodiment, the format of the second structured text data may be different from the format of the second structured text data.

[0274] According to one embodiment, in a non-transient computer-readable medium storing instructions that cause the electronic device to perform an operation when executed by a processor of the electronic device, the operation may include: an operation of obtaining a captured image of the first screen and additional information related to the first screen when a first user command for capturing a first screen is identified; an operation of converting the identified text into structured text data based on the additional information when text included in the captured image is identified; an operation of obtaining paste content based on the structured text data and context information of the second screen; and an operation of inputting the paste content into an input field included in the second screen.

[0275] According to the various embodiments described above, only the necessary information is extracted from a screen containing complex information and / or long text information, thereby minimizing user inconvenience that may occur when the information needs to be transferred and entered from a screen of another application.

[0276] Although the various embodiments described above use multiple individual neural network models, the operation of at least two of the multiple neural network models may be implemented in a single neural network model.

[0277] Each operation according to the various embodiments described above may be performed by the processor (110), but if necessary, a module for each operation may be used. For example, each module may be implemented with at least one software, at least one hardware, and / or a combination thereof. Each module may be implemented to use a predefined algorithm, a predefined formula, and / or a learned artificial intelligence model to perform the operation. However, at least some modules may be distributed to external devices.

[0278] The methods according to the various embodiments of the present disclosure described above may be implemented in the form of an application that can be installed on an existing electronic device. Alternatively, the methods according to the various embodiments of the present disclosure described above may be performed using a deep learning-based artificial neural network (or deep artificial neural network), that is, a learning network model.

[0279] The methods according to the various embodiments of the present disclosure described above can be implemented by software upgrades or hardware upgrades alone for existing electronic devices.

[0280] The various embodiments of the present disclosure described above may also be performed through an embedded server equipped in an electronic device or an external server of the electronic device.

[0281] According to a specific example of the present disclosure, the various embodiments described above may be implemented as software comprising instructions stored on a machine-readable storage medium (e.g., a computer). The machine may include an electronic device (e.g., electronic device (A)) according to the disclosed embodiments, which is a device capable of calling instructions stored from the storage medium and operating according to the called instructions. When instructions are executed by a processor, the processor may perform a function corresponding to the instructions directly or by using other components under the control of the processor. Instructions may include code generated or executed by a compiler or an interpreter. The machine-readable storage medium may be provided in the form of a non-transitory storage medium. Here, "non-transitory" means only that the storage medium does not contain a signal and is tangible, and does not distinguish whether data is stored semi-permanently or temporarily in the storage medium.

[0282] Additionally, according to one embodiment of the present disclosure, the method according to the various embodiments described above may be provided as included in a computer program product. The computer program product may be traded between a seller and a buyer as a product. The computer program product may be distributed online in the form of a device-readable storage medium (e.g., compact disc read-only memory (CD-ROM)) or through an application store (e.g., Play Store™). In the case of online distribution, at least a portion of the computer program product may be temporarily stored or temporarily created in a storage medium such as the memory of a manufacturer's server, an application store's server, or a relay server.

[0283] Additionally, each component (e.g., module or program) according to the various embodiments described above may be composed of a single or multiple entities, and some of the aforementioned sub-components may be omitted, or other sub-components may be further included in the various embodiments. Generally or additionally, some components (e.g., module or program) may be integrated into a single entity to perform the functions performed by each of the respective components prior to integration in the same or similar manner. The operations performed by the module, program, or other components according to the various embodiments may be executed sequentially, in parallel, iteratively, or heuristically, or at least some operations may be executed in a different order, omitted, or other operations added.

[0284] Although an embodiment of the present disclosure has been illustrated and described above, the embodiments are not limited to the specific embodiment described above. It is understood that various modifications can be made by those skilled in the art without departing from the gist of the present disclosure as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present disclosure.

Claims

1. In an electronic device (100), Memory (120) for storing instructions; Display (130); and It includes at least one processor (110) including a processing circuitry; and When the above instructions are executed individually or collectively by the at least one processor, the electronic device, When a first user command for capturing a first screen of the above display is identified, a captured image of the first screen and additional information related to the first screen are obtained, and When text included in the above-mentioned captured image is identified, the identified text is converted into structured text data based on the above-mentioned additional information, and Based on the above structured text data and the context information of the second screen of the display, paste content is obtained, and An electronic device that inputs the above-mentioned paste content into an input field included in the second screen.

2. In Paragraph 1, Additional information related to the first screen above is, It includes at least one of application information corresponding to the first screen and URL (uniform resource locator) information corresponding to the first screen, and When the above instructions are executed individually or collectively by the at least one processor, the electronic device, An electronic device that converts the identified text into the structured text data based on at least one of the application information and the URL information corresponding to the first screen.

3. In Paragraph 1 or 2, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Identifying information regarding the input field included in the second screen according to the second user command, and The paste content is obtained based on the structured text data, the context information, and the information regarding the input field, and Information regarding the above input field is, It includes information on whether text is supported, whether rich text is supported, and whether images are supported, and The above context information is, An electronic device comprising at least one of information displayed on the second screen, surrounding environment information of the electronic device, usage history information of the electronic device, and user profile information.

4. In Paragraph 3, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Inputting a prompt obtained based on the above structured text data, the above context information, and the information regarding the above input field into a first artificial intelligence model to obtain a plurality of paste contents that can be entered in the above input field, and An electronic device that inputs at least one of the above-mentioned multiple paste contents into the input field.

5. In Paragraph 3, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Identifying multiple paste types based on information regarding the above input field, and A prompt obtained based on the above structured text data, the above context information, and the above plurality of paste types is input into a first artificial intelligence model to obtain the above plurality of paste contents, and A UI (user interface) including information on the above plurality of paste types is provided on the second screen, and An electronic device that inputs at least one paste content corresponding to the selected paste type into the input field when at least one of the plurality of paste types is selected through the UI.

6. In Paragraph 5, The above multiple paste types are, An electronic device comprising at least one of an edit type, a summary type, and an original type for at least one of the text and image included in the above-mentioned captured image.

7. In Paragraph 5, The information regarding the plurality of paste types included in the above UI screen is, It includes at least one of a contextual edit menu, a simple edit menu, a text summary menu, an image and text summary menu, and an original text menu, and When the above instructions are executed individually or collectively by the at least one processor, the electronic device, When the above contextual edit menu is selected, a first paste content including an answer that maintains the context of the content provided on the second screen is entered into the input field, and When the above Simple Edit menu is selected, a second paste content containing a simple answer to the content provided on the second screen is entered into the input field, and When the above text summary menu is selected, a third paste content including a summary of the text included in the above captured image is entered into the above input field, and When the above image and text summary menu is selected, a fourth paste content including a summary of the image included in the above captured image and the text included in the above captured image is entered into the above input field, and An electronic device that inputs a fourth paste content, including the original text included in the captured image, into the input field when the original text menu is selected.

8. In Paragraph 1 or 2, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, Identifying multiple paste types based on information regarding the above input field, and A UI including information on the plurality of paste types and a preview area for the plurality of paste types is provided on the second screen, and An electronic device that inputs at least one paste content corresponding to the selected paste type into the input field when at least one of the plurality of paste types is selected through the UI.

9. In Paragraph 1 or 2, Additional information related to the first screen above includes application information corresponding to the first screen, and When the above instructions are executed individually or collectively by the at least one processor, the electronic device, If the above application information includes first type application information, the identified text is converted into first structured text data based on a format corresponding to the first type application information, and If the above application information includes application information of a second type that is different from the first type, the identified text is converted into second structured text data based on a format corresponding to the second type application information, and The format of the second structured text data above is different from the format of the second structured text data, an electronic device.

10. In Paragraph 1 or 2, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, The above-mentioned captured image, the above-mentioned additional information, and the above-mentioned structured text data are stored in the memory, and When the above-mentioned paste content is entered into an input field included in the first application screen and the above-mentioned capture image stored in the memory is selected in the second application screen, the paste content is obtained based on the structured text data stored in the memory and the context information of the second application screen, and An electronic device that inputs the above-mentioned acquired paste content into an input field included in the second application screen.

11. In Paragraph 1 or 2, When the above instructions are executed individually or collectively by the at least one processor, the electronic device, An electronic device that inputs a prompt obtained based on text included in the above-mentioned captured image and additional information related to the above-mentioned screen into a second artificial intelligence model to obtain the above-mentioned structured text data.

12. In a method for controlling an electronic device, When a first user command for capturing a first screen is identified, an operation to obtain a captured image of the first screen and additional information related to the first screen; When text included in the above-mentioned captured image is identified, the operation of converting the identified text into structured text data based on the above-mentioned additional information; An operation to obtain paste content based on the structured text data and context information of the second screen; and A control method comprising the action of inputting the above-mentioned paste content into an input field included in the second screen.

13. In Paragraph 12, Additional information related to the first screen above is, It includes at least one of application information corresponding to the first screen and URL information corresponding to the first screen, and The operation of converting the above-identified text into the above-structured text data is, A control method comprising: converting the identified text into the structured text data based on at least one of the application information and the URL information corresponding to the first screen.

14. In Paragraph 12 or 13, The above control method is, The operation of identifying information about the input field included in the second screen according to the second user command is further included, The operation of obtaining the above-mentioned paste content is, The operation of obtaining the paste content based on the structured text data, the context information, and the information regarding the input field; is included, Information regarding the above input field is, It includes information on whether text is supported, whether rich text is supported, and whether images are supported, and The above context information is, A control method comprising at least one of information displayed on the second screen, surrounding environment information of the electronic device, usage history information of the electronic device, and user profile information.

15. A non-transient computer-readable medium storing instructions that cause said electronic device to perform an operation when executed by a processor of said electronic device, The above operation is, When a first user command for capturing a first screen is identified, an operation to obtain a captured image of the first screen and additional information related to the first screen; When text included in the above-mentioned captured image is identified, the operation of converting the identified text into structured text data based on the above-mentioned additional information; An operation to obtain paste content based on the structured text data and context information of the second screen; and A non-transient computer-readable medium comprising the operation of inputting the above-mentioned pasted content into an input field included in the second screen.