Methods for implementing social media networks, computing systems, computer-readable media, and computer programs.
The computing system with a conversational agent and language model facilitates efficient and secure editing of social media content by filtering user inputs and executing permitted tool commands, addressing user interface complexity and unpredictable outputs.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- LEMON CO LTD
- Filing Date
- 2024-06-28
- Publication Date
- 2026-07-02
Smart Images

Figure 2026521877000001_ABST
Abstract
Description
Technical Field
[0001] Cross - reference to Related Applications This application claims the priority of U.S. Application No. 18 / 346,707, filed on July 3, 2023, with the title "Social Media Network Dialogue Agent", and the disclosure of this application is incorporated herein by reference in its entirety.
[0002] The present invention relates to a social media network dialogue agent.
Background Art
[0003] Typical social media networks enable users to share various types of multimedia content, such as videos. Social media networks can enable users to edit videos in various ways, such as trimming the video length, adjusting the playback speed, adding text overlays, transitions, or other effects. For this reason, various controls for editing videos can be provided in the user interface (UI).
[0004] However, this approach has a major drawback in that many of these functions are not recognized or fully utilized by the average user. Due to the complex nature of the UI, the lack of understanding of the functions of specific tools, or the perceived difficulty of the editing process, in many cases, users are not able to make full use of the available video editing capabilities. As a result, many users may not be able to maximize these editing functions, and their content may not achieve the expected effect or impact.
[0005] An alternative approach allows users to interact with application functionality through natural language queries. Such an approach involves inputting natural language queries into a language model, which then generates output describing how to achieve the desired outcome described in the query. However, various types of malicious inputs are known that, when fed into a language model, can cause undesirable or unpredictable output. When used in a social media context, a language model prompted with malicious input may generate inappropriate content and trigger actions that would otherwise be prohibited. [Overview of the project]
[0006] Examples are provided relating to implementing actions on social media network content based on natural language input. One embodiment includes a computing system configured to implement a social media network, the computing system comprising one or more processors and a storage device containing instructions, the instructions being executable to receive user input from a conversational agent configured to engage in interaction using at least a language model, the user input including a natural language description of a request for an action on a content item, and generating a prompt for the language model based at least the user input. The instructions are further executable to input the prompt into the language model to generate an output describing an operation for implementing the action, to implement the operation by calling a backend service of the social media network to execute a command, and to output the result of executing the command. For example, the request may be for editing the content item.
[0007] This summary is provided to present a simplified excerpt of the concept, which will be further described in the embodiments for carrying out the invention described below. This summary is not intended to identify the main or basic features of the claimed subject matter, nor to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to an implementation that solves any or all of the defects described in any part of this disclosure. [Brief explanation of the drawing]
[0008] [Figure 1] This is a schematic diagram showing a computing system as an example of this disclosure. [Figure 2] This is a schematic diagram showing the pipeline for implementing the various services in Figure 1. [Figure 3] This figure shows an example of interaction between a user and the conversational agent shown in Figure 1. [Figure 4] This figure shows an example of interaction between a user and the conversational agent shown in Figure 1. [Figure 5] This figure shows an example of interaction between a user and the conversational agent shown in Figure 1. [Figure 6] This is a flowchart of the method relating to the example in this disclosure. [Figure 7] This figure shows an exemplary computing environment in this disclosure. [Modes for carrying out the invention]
[0009] In view of the above issues, this disclosure describes a computing system 100 configured to implement a social media network with an interactive interface for performing actions on social media content, such as video content. The example is presented in the context of performing editing actions to edit a content item, but is also applicable to other types of actions that may be performed on a content item, including but not limited to consuming, publishing, sharing, and reporting on the content item. The computing system 100 includes various processing and storage components for implementing the social media network and associated functions, examples of which are described below with reference to Figure 7. In this exemplary implementation, the computing system 100 runs a front-end interaction agent 102 configured to engage in interaction with a user 104 of the social media network using at least a language model 106. The computing system 100 receives user input, which includes a natural language description of an editing request 108 for a content item 109, and generates a prompt 110 for the language model 106 based at least on the user input. The prompt 110 is input to the language model 106 to generate a language model output 111 describing one or more editing operations for editing the content item 109. The computing system 100 performs an editing operation by calling the backend service 112 of the social media network via the frontend interface 114 and executing one or more commands, thereby obtaining an edited version 116 of the content item 109, which may be output to user 104 and / or shared on the social media network.
[0010] The conversational agent 102 may be embodied as an online application service or “chatbot” on an online social media platform. “Chatbot” means an automated software tool designed and programmed to interact with users of a social media application via text or voice-based natural language queries. In the exemplary implementation shown in Figure 1, the conversational agent 102 is implemented in a social network client 118. The client 118 may run on a computing device operated, for example, by user 104. In some implementations, the client 118 may present a graphical user interface (GUI), through which interaction between the conversational agent 102 and user 104 may occur. The GUI may display text corresponding to user agent dialogue, such as text representing a natural language query from user 104, or text representing a natural language response from the conversational agent 102. The conversational agent 102 may implement privacy features to obtain user consent for sending user input to the language model 106. The conversational agent may also anonymize any personally identifiable information contained in the user input sent to the language model.
[0011] The dialogue agent 102 uses the language model 106 to devise natural language responses and engages in dialogue with the user 104. Among the various potential topics and types of interactions that may be facilitated with the user 104, the dialogue agent 102 may use the language model 106 to process natural language queries concerning content items (e.g., user-generated content) uploaded by the user to a social media network, such as content item 109. The content item 109 may be uploaded to a social media network, for example, via a social network client 118.
[0012] If content item 109 contains video content, the content item may be processed by video asset analyzer 122 to generate video metadata 124. The video asset analyzer 122 may preprocess content item 109 to extract individual frames, analyze the visual and audio content, and generate video metadata 124 that includes text descriptions of the analyzed visual and audio content, recognized entities, timestamps for key events, and / or video captions for the content item. In some implementations, language model 106 may receive video metadata 124 from among other potential inputs that can be supplied to the language model to generate contextually relevant natural language output or recommended actions regarding content item 109. For example, language model 106 may be a multimodal language model that can accept both natural language input and images as inputs. For this reason, language model 106 may be trained with a variety of data types, including but not limited to text, video, audio, and / or image data. In some implementations, language model 106 may be a large-scale language model.
[0013] The language model 106 can be trained to engage in various types of interactions with the user 104, such as navigational conversations to guide the user to use tools available in the social network client 118, heuristic conversations to suggest ideas about future content, or edit-focused conversations to assist the user in applying edits to content items requested by the user and / or suggested via the language model. A series of edits may be linked in an efficient manner, which would normally require considerable user effort through a conventional user interface. Therefore, a user who knows how to edit and improve a content item but is unaware of the specific tools available to edit the content item may be guided by an edit-focused conversation with the conversational agent 102.
[0014] The computing system 100 includes a prompt manager 126 configured to obtain language model output from a language model by generating prompts for input to the language model 106. Prompts may be generated based on user input, including natural language queries. In the example shown in Figure 1, user input, including a natural language description of an edit request 108 describing a request to edit a content item 109, is received in a dialogue between user 104 and dialogue agent 102 via a social network client 118. The computing system 100 uses the prompt manager 126 to generate a prompt 110 based on at least this user input. As shown in element A and described in more detail below, user input to the prompt manager 126 may be filtered in various ways, for example, queries that are not for editing content or queries irrelevant to the editing context may be filtered out. Furthermore, in some implementations, various preprocessing steps may be performed on user input and natural language queries, such as cleaning (e.g., removal of unnecessary punctuation or irrelevant characters), tokenization of queries, and application of language detection or translation.
[0015] When generating prompts for the language model 106, the prompt manager 126 may access a prompt pool 128 that stores a number of predetermined prompts. For each content editing function enabled by the social network client 118, a sample prompt to the language model 106 may be generated and added to the prompt pool 128, and the sample prompt is configured to produce output that describes an operation that, when executed, can achieve the editing function if provided as input to the language model. In the exemplary implementation shown in Figure 1, the editing functions for editing content items are implemented by various tools 130 that can be called in the backend service 112, respectively, via a corresponding tool interface 132, such as a function call interface or an application programming interface (API). Thus, the tool interface 132 may each provide a frontend interface to the tools 130 and functions in the backend of the social media network. Furthermore, the computing system 100 may include a tool pool 133. Tool 130 is registered in tool pool 133, and tools, tool commands, tool functions, and / or tool interfaces 132 can be retrieved from tool pool 133 from the front end of the social media network.
[0016] In some examples, a tool interface 132 may be added to the prompt pool 128 for each editing tool, along with sample prompts. Thus, the sample prompts may include a description of the corresponding editing tool, typical queries for or related to the tool, a predefined input format for the tool, and potential intermediate actions to be performed when using the tool. Furthermore, one or more function calls or application programming interfaces may be added to the prompt pool 128, along with each sample prompt that can be called to generate the corresponding editing operation.
[0017] Upon receiving user input from user 104, including a natural language description of an edit request 108 for editing content item 109, the prompt manager 126 queries the prompt pool 128 for prompts with descriptions related to that edit request. A new prompt may be formed by combining various predetermined prompts and / or sample prompts, and the prompt 110 can be formed by filling in the new prompt with data specific to the edit request 108. As shown in the illustrated example, the prompt 110 may include the edit request 108 or data derived from the edit request. In such an example, the edit request 108 or the derived data may be provided to the language model 106 as input along with the prompt 110.
[0018] The computing system 100 implements a language model agent 134, which is configured to prompt a language model 106 to obtain language model output, and based on such language model output, generate callable tool commands in the backend service 112 to achieve operations described in or related to the language model output. For example, referring to prompt 110, via the prediction module 136, the language model agent 134 obtains a language model output 111 describing one or more editing operations for editing content item 109 by providing the prompt and potentially other data described above as input to the language model 106. If implemented, the editing operations may achieve at least a portion of the editing request 108 described in a natural language query from user 104. More specifically, the prediction module 136 may perform inference on prompt 110 to predict a response to the prompt, and parse the response to obtain structured information including the editing operations. Based on the structured information, the language model agent 134 identifies tools 130 that can be invoked in the backend service 112 via the action plan and execution module 138 to perform editing operations. For relatively open-ended or complex natural language queries, the language model agent 134 may perform self-search and generate various intermediate steps to achieve the request expressed in such a query. For each step, the language model agent 134 may perform search or follow-up questions with the requesting user to iteratively approach the final dialogue response.
[0019] After identifying one or more tools 130 that achieve the editing operations described in the language model output 111, the language model agent 134 implements the corresponding editing functionality of each tool by using the planning / execution module 138 to generate one or more tool commands that can be invoked via the backend service 112 for each tool. In some examples, the language model 106 may be used to generate tool commands based, for example, on tool commands obtained from the tool pool 133 and / or data obtained from the prompt pool 128. Furthermore, in some scenarios, the planning / execution module 138 may construct a whitelist 140 of tool commands that are permitted to be executed in the backend service 112. As described below, permitted and dispermitted tool commands may be established for different types of user queries, user account types or privilege levels, and / or any other appropriate criteria. Thus, a whitelist of tool commands for a particular tool 130 may include a subset of the entire set of tool commands associated with that tool. When attempting to serve an edit request 108, a whitelist 140 may be established that omits tool commands not related to editing video content, for example.
[0020] After generating tool commands to implement the editing operations described in the language model output 111 from the language model 106, the planning / execution module 138 calls the backend service 112 to execute the tool commands to implement the editing operations, thereby generating an edited version 116 of the content item 109. In the illustrated example, the edited version 116 of the content item 109 is provided to the user 104 via the social network client 118, and the social network client 118 may provide the user with various options regarding the edited version, such as the ability to publish the edited version to the social media network for sharing with other users 142 who can participate in the edited version via the social network client. As an example, FIG. 1 shows that the edited version 116 of the content item 109 is sent from the social network client 118 to a content server 144 where other users 142 can access the edited version.
[0021] The computing system 100 may include an audience engagement aggregation module 146, and the audience engagement aggregation module 146 is configured to analyze the performance of the edited content item 116 and generate performance analysis data about the edited content item after publication on the social media network. The performance of the edited content item 116 may be observed based on factors including but not limited to the number of views, likes, shares, comments, viewer retention rate, and user engagement. For example, when users of the social media network view, like, share, and comment on the edited content item 116, the aggregation module 146 may track and record these interactions. The aggregation module 146 may also record evaluation metrics such as overall user engagement, which may be a combination of viewer retention rate and analysis data regarding likes, comments, shares, and number of views.
[0022] Performance analysis data may be provided to a prompt refinement module 148 configured to update prompts in the prompt pool 128 and potentially update parameters or attributes of the prompt manager 126. Performance analysis data may further be provided to a language model refinement module 150 configured to update parameters of the language model 106. Thus, active and passive user engagement with the edited content item 116 on the social media network can inform the selection of prompts and natural language responses regarding user editing requests and other natural language queries, enabling their continuous refinement.
[0023] In addition to providing the edited content item 116 to user 104, the conversational agent 102 may output a natural language response 152 to the user based on the edited content item. The response 152 may, for example, describe one or more of the following: the creation of the edited content item 116, its availability to user 104 via the social network client 118, and may involve the user in a conversation about publishing the edited content item to a social network or further refining the edited content item. In some examples, as described below, the response 152 may be filtered before being provided to user 104 via client 118.
[0024] As described above, various types of malicious inputs to language models are known to cause undesirable or unpredictable outputs. As an example of such an input, a user may prompt a language model to provide a text translation from one language to another while asking the language model to generate an incorrect translation. As another example, a user may prompt a language model to perform an operation and arbitrarily repeat the operation multiple times. Other concerns may arise when the language model is prompted to perform a programmatic action, such as calling a service of a computing system implementing the language model or manipulating a file or other data stored in the computing system. In this case, whether the output from the language model and whether such output is restricted to only permitted actions is particularly unpredictable compared to other prompts that simply give the language model the task of processing input content, where the actions taken are not prompted by the user but are delegated to the language model or can be known in some other way before those actions are executed.
[0025] Computing system 100 addresses these issues through the various services described above or shown in A, B, C, and D. FIG. 2 schematically shows a pipeline 200 that implements services A - D to facilitate desired permitted operations and outputs from language model 106 and separates front - end dialogue operations from back - end tool and tool - command execution. Computing system 100 may implement, for example, aspects of pipeline 200.
[0026] As shown in Figures 1 and 2, service A is configured to filter edit requests 108 from user 104 to edit content item 109. Filtering edit requests 108 and other natural language requests from the user may include identifying malicious intent in requests and / or filtering out non-edit requests that are not intended for editing content items. Edit requests 108 may be filtered via service A to form filtered user input, and prompts to the language model 106 may be generated based on this filtered user input. Other types of filtering on user input may be performed, such as filtering that restricts user queries to a time range (e.g., daily). In this way, irrelevant queries that are not intended for editing content, or queries that attempt to directly request backend execution of commands, may be omitted from prompts entered into the language model 106.
[0027] As described above, the language model output from language model 106 may describe an editing operation for editing content items, where implementing the editing operation fulfills at least part of an editing request expressed in a natural language query from a user, or is otherwise related to such editing request. To fulfill at least part of an editing operation, various processes may be performed on the description of the editing operation, including filtering the editing operation, converting the editing operation into a tool command, or otherwise generating a tool command. Service B is configured to build a whitelist of tool commands that are permitted to be executed in the backend. For example, a set of tools 130 available in backend service 112 may be identified via tool pool 133 to accomplish a set of editing operations. Each tool may provide a corresponding set of tool commands that it can execute to accomplish the corresponding tool functionality. For a set of tool commands associated with a corresponding tool, Service B may determine whether the tool command is registered in the tool command whitelist for the corresponding tool. The tool command whitelist may include, for example, a subset of the overall tool command pool available to the tool. For example, as shown in Figure 2, whitelisted tool commands are passed to the backend service 112 that is called to execute the whitelisted tool commands, while tool commands that are not whitelisted are neither passed to the backend service nor executed. In this way, a set of known tools, tool commands, and editing operations that are permitted to be executed can be established in the backend.
[0028] Tool commands may be selectively permitted and blocked based on other criteria. For example, a first tool command and a second tool command may be generated to edit a content item based on language model output. The first and second tool commands may be compared to the permission level associated with the user account of the user who requested the editing of the content item. If it is determined that the first tool command is permitted at that permission level (for example, via the planning / execution module 138), the first tool command may be executed in the backend service 112. Conversely, if it is determined that the second tool command is not permitted at that permission level, the execution of the second tool command may be blocked in the backend service 112. When selectively executing tool commands, other criteria, including but not limited to account type, user attributes, and subscription type, may also be evaluated.
[0029] The provision of frontend and backend, and the separation of frontend functions (e.g., interaction) from backend functions (e.g., execution of tool commands), are demonstrated by service C. Such an architecture ensures that users interacting with the social media network are not given the ability to directly invoke backend services or execute tool commands. Instead, as described above, whitelisted tool commands may be passed from the frontend to the backend via a frontend interface 200 that can derive them from the language model output and execute the tool commands by calling tool functions. The frontend interface 200 may include, for example, a tool interface 132.
[0030] Figure 2 further illustrates the filtering of the natural language response output from the conversational agent 102 via service D, based on the edited content item 116. Here, filtering the response output from the conversational agent 102 generates a filtered natural language response 202 (e.g., response 152). Filtering via service D includes a programmatic automated review of the initial response from the conversational agent 102, and may include filtering of the initial response via either or both of sensitive word detection and intent detection. For example, sensitive words may be omitted from the filtered natural language response 202. In this way, the provision of inappropriate or unwanted content in output to users of social media networks can be avoided.
[0031] In some examples, personalized output may be provided in response to natural language queries from the user. In such examples, prompts to the language model 106, devised based on the query, do not have to include personal data about the user. Instead, the output to the user may be personalized with data about the user that is made available in the backend. In some examples, video content items may be edited to include audio or graphical assets that can be recommended to the user based on a feature vector or other appropriate data structure representing the user's engagement with social media networks.
[0032] Figure 3 illustrates an example of interaction between a social media network user and a conversational agent 102, which takes place via a GUI presented by a social network client 118. In this example, the user uploads a video content item 300 to the social media network via client 118. The conversational agent 102 prompts the user with "Do you want to improve this video?". The user interacts with this prompt, and the conversational agent 102 further prompts the user with "Do you want to improve this video? Tell me how you would like it edited" (302). The user responds to the conversational agent 102 with a natural language query 304 that includes an edit request to edit the video content item 300, requesting that the conversational agent 102 "apply a glitter filter 100 times". Here, the user requests the addition of a graphical asset in the form of a glitter filter. However, the operation requested in the edit request, namely the addition of the graphical asset, is requested to be repeated arbitrarily 100 times. Because this repetition unnecessarily consumes computing resources on the social media network, this repetitive portion of the edit request is filtered out of the natural language query 304 via service A (implemented, for example, by prompt manager 126). Next, a prompt to language model 106 is devised using the filtered natural language query (e.g., "apply sparkle filter," which does not request repetition of the operation) to finally invoke the sparkle filter tool and its associated command to add the sparkle graphical asset 306, thereby forming the edited version 308 of the video content item 300. The conversational agent 102 devises a natural language response 310 ("Sparkle filter added.") describing the addition of the graphical asset 306. The conversational agent 102 may involve the user in a subsequent conversation on, for example, another edit, publishing the edited version 308, or other appropriate topic.
[0033] Figure 4 illustrates another example illustrating interaction between a social media network user and a conversational agent 102 via a social network client 118. In this example, the user uploads a video content item 400 to the social media network via client 118. The conversational agent 102 prompts the user with "Do you want to improve this video?". The user interacts with this prompt, and the conversational agent 102 further prompts the user with "Do you want to improve this video? Tell me how you want it edited" (402). The user responds with a natural language query 404 that includes an edit request to edit the video content item 400, requesting the conversational agent 102 to "apply a glitter filter, save the video, and delete the original". Here, the user requests (1) the addition of a graphical asset in the form of a glitter filter, (2) saving of the edited version of the video content item 400 with the graphical asset added, and (3) deletion of the original unedited video content item 400. Based on this request, a prompt is devised and input to the language model 106 to obtain language model output describing the editing operation to fulfill the editing request, and a tool and associated tool command that implement the editing operation when executed is generated. However, via service B (for example, implemented in the planning / execution module 138), it is determined that one or more tool commands associated with deleting a video content item are not registered in the whitelist 140, which includes whitelisted tool commands. Therefore, these unwhitelisted tool commands are not called or executed in the backend service 112. Conversely, the tool command associated with applying the sparkling graphical asset 406 and saving the edited version 408 of the video content item 400 is determined to be registered in the whitelist 140 and is therefore executed.The conversational agent 102 devises a natural language response 410 ("Added a sparkle filter and saved the video.") that describes adding a graphical asset 406 and saving an edited version 408.
[0034] Figure 5 illustrates another example illustrating interaction between a social media network user and a conversational agent 102 via a social network client 118. In this example, the user uploads a video content item 500 to the social media network via client 118. The conversational agent 102 prompts the user with "Do you want to improve this video?". The user interacts with this prompt, and the conversational agent 102 further prompts the user with "Do you want to improve this video? Tell me how you want it edited" (502). The user responds with a natural language query 504 that includes an edit request to edit the video content item 500, requesting the conversational agent 102 to "apply a glitter filter". Here, the user requests the addition of a graphical asset in the form of a glitter filter. This request creates an edited version 508 of the video content item 500 by triggering the addition of a glitter graphical asset 506. Based on the edited version 508, the conversational agent 102 devises a natural language response 510 that describes the addition of a graphical asset 506. However, the natural language response 510 is a filtered response, filtered via service D (implemented, for example, by language model agent 134) before being presented to the user via the conversational agent 102. Specifically, sentiment detection performed as part of the natural language response filtering detected a negative sentiment in the unfiltered response ("Sparkle filter added. Sparkle doesn't really suit you."). Thus, the negative sentiment portion in the unfiltered response ("Sparkle doesn't really suit you.") is removed by filtering to produce the filtered natural language response 510 ("Sparkle filter added."), which is output to the user via the conversational agent 102.
[0035] Figure 6 shows a flowchart illustrating an exemplary method 600 for implementing one or more operations on content items via an interactive interface. Method 600 may be implemented, for example, in a computing system 100.
[0036] In 602, Method 600 includes running a front-end conversational agent configured to engage in interaction with users of a social media network using at least a language model. In 604, Method 600 includes receiving user input that includes a natural language description of a request for an action on a content item. The request may include, for example, an edit request to edit a content item. In 606, Method 600 includes filtering out non-edit requests that are not intended for editing a content item from the user input to form filtered user input. In 608, Method 600 includes generating a prompt for a language model based at least on the user input. In an example where filtered user input is devised, as shown in 610, Method 600 includes generating a prompt based on the filtered user input.
[0037] In 612, method 600 includes inputting a prompt into a language model to generate a language model output that describes one or more operations for implementing an action. If the request is an edit request for editing a content item, the operations may include, for example, an edit operation for editing the content item. In 614, method 600 includes identifying one or more tools that can be called in a backend service based on one or more operations. If the operations include an edit operation for editing a content item, the tools may be callable for editing the content item. In 616, method 600 includes generating one or more tool commands via the language model for each of the one or more tools to implement the one or more operations. In 618, method 600 includes determining, for each of the one or more tool commands, whether the tool command is registered in a tool command whitelist for the corresponding tool. The tool command whitelist may include, for example, a subset of the overall tool command pool for the corresponding tool.
[0038] In 620, method 600 includes implementing one or more operations by calling a backend service of a social media network via a frontend interface to execute one or more commands. If the operation includes an edit operation, an edited version of the content item may be created by executing a command to perform the edit operation. As shown in 622, the backend service may be called to execute a whitelisted tool command. In 624, method 600 includes outputting the results of executing one or more commands. The results may include, for example, an edited version of the content item. In 626, method 600 includes outputting a natural language response from a conversational agent to the user based on the results. If the results include an edited version, the natural language response may be output based on the edited version. In 628, method 600 includes filtering the natural language response via either or both of sensitive word detection and intent detection.
[0039] It is understood that the approaches described herein can be adapted to any appropriate type of operation on content within social media networks, including operations for editing content items and other non-editing operations. Examples of operations include, but are not limited to, publishing content items, viewing or consuming content items, sharing content items, and reporting content items (e.g., as inappropriate).
[0040] In some embodiments, the methods and processes described herein may be linked to a computing system of one or more computing devices. Specifically, such methods and processes may be implemented as computer application programs or services, application programming interfaces (APIs), libraries, and / or other computer program products.
[0041] Figure 7 schematically illustrates a non-limiting embodiment of a computing system 700 capable of implementing one or more of the methods and processes described above. The computing system 700 is shown in a simplified form. The computing system 700 may embody the computer system 100 shown in Figure 1, as described above. The computing system 700 may take the form of one or more personal computers, server computers, tablet computers, home entertainment computers, network computing, game devices, mobile computing devices, mobile communication devices (e.g., smartphones) and / or other computing devices, and wearable computing devices, such as smartwatches and head-mounted augmented reality devices.
[0042] The computing system 700 comprises a logical processor 702, a volatile memory 704, and a non-volatile storage device 706. The computing system 700 may optionally include a display subsystem 708, an input subsystem 710, a communication subsystem 712, and / or other components not shown in Figure 7.
[0043] The logical processor 702 includes one or more physical devices configured to execute instructions. For example, the logical processor may be configured to execute instructions that are part of one or more applications, programs, routines, libraries, objects, components, data structures, or other logical structures. Such instructions may be implemented to perform tasks, implement data types, transform the state of one or more components, achieve technical effects, or achieve desired results.
[0044] A logical processor may include one or more physical processors (hardware) configured to execute software instructions. Furthermore, or alternatively, a logical processor may include one or more hardware logic circuits or firmware devices configured to execute logic or firmware instructions implemented in hardware. The processors of the logical processor 702 may be single-core or multi-core, and the instructions executed on them may be configured for sequential, parallel, and / or distributed processing. Individual components of the logical processor may optionally be distributed across two or more separate devices located remotely and / or configured for coordinated processing. Embodiments of the logical processor may be virtualized and executed by computing devices connected to a remotely accessible network configured in a cloud computing setup. In such cases, it should be understood that these virtualized embodiments run on different physical logical processors on various different machines.
[0045] The non-volatile storage device 706 includes one or more physical devices configured to hold instructions executable by a logical processor in order to implement the methods and processes described herein. When such methods and processes are implemented, the state of the non-volatile storage device 706 may be transformed, for example, to hold different data.
[0046] The non-volatile storage device 706 may include removable and / or built-in physical devices. The non-volatile storage device 706 may include optical memory (e.g., CD, DVD, HD-DVD, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, flash memory, etc.), and / or magnetic memory (e.g., hard disk drive, floppy disk drive, tape drive, MRAM, etc.), or other mass storage technology. The non-volatile storage device 706 may include non-volatile, dynamic, static, read / write, read-only, sequential access, position-addressable, file-addressable, and / or content-addressable devices. It will be understood that the non-volatile storage device 706 is configured to retain instructions even when power to the non-volatile storage device 706 is cut off.
[0047] The volatile memory 704 may include a physical device that provides random access memory. The volatile memory 704 is typically used by the logical processor 702 to temporarily store information during the processing of software instructions. It will be understood that if power to the volatile memory 704 is cut off, the volatile memory 704 will typically no longer store instructions.
[0048] Embodiments of the logic processor 702, volatile memory 704, and non-volatile storage device 706 may be integrated into one or more hardware logic components. Such hardware logic components may include, for example, field-programmable gate arrays (FPGAs), integrated circuits for specific programs and applications (PASICs / ASICs), standard products for specific programs and applications (PSSPs / ASSPs), systems on a chip (SOCs), and complex programmable logic devices (CPLDs).
[0049] The terms “module,” “program,” and “engine” may be used to describe a form of computing system 700 that is generally implemented in software to perform a specific function, which involves a conversion process that specially configures the processor to perform a certain function using a portion of volatile memory. Thus, a module, program, or engine may be instantiated using a portion of volatile memory 704 via a logical processor 702 that executes instructions held by a non-volatile memory 706. It will be understood that different modules, programs, and / or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Similarly, the same module, program, and / or engine may be instantiated from different applications, services, code block, object, routine, API, function, etc. The terms “module,” “program,” and “engine” may include individuals or groups such as executable files, data files, libraries, drivers, scripts, database records, etc.
[0050] The display subsystem 708, if included, may be used to present a visual representation of the data held by the non-volatile memory 706. This visual representation may take the form of a graphical user interface (GUI). Since the methods and processes described herein modify the data held by the non-volatile memory and transform the state of the non-volatile memory, the state of the display subsystem 708 may also be transformed to visually represent the changes in the underlying data. The display subsystem 708 may include one or more display devices utilizing substantially any type of technology. Such display devices may be combined with the logical processor 702, volatile memory 704 and / or non-volatile memory 706 within a shared enclosure, or such display devices may be peripheral display devices.
[0051] The input subsystem 710 may include, if included, one or more user input devices, such as a keyboard, mouse, touchscreen, or game controller, and may be connected to them. In some embodiments, the input subsystem may include, and may be connected to, selected natural user input (NUI) components. These components may be integrated or peripheral, and the transmission and / or processing of input actions may be handled onboard or offboard. Exemplary NUI components may include a microphone for speech and / or voice recognition, an infrared camera, color camera, stereo camera, and / or depth camera for machine vision and / or gesture recognition, a head tracker, eye tracker, accelerometer and / or gyroscope, and / or any other suitable sensors for motion detection and / or intent recognition.
[0052] The communication subsystem 712, if included, may be configured to enable communication between the various computing devices described herein and with other devices. The communication subsystem 712 may include wired and / or wireless communication devices compatible with one or more different communication protocols. In a non-limiting example, the communication subsystem may be configured to communicate over a wireless telephone network or a wired or wireless local or wide area network. In some embodiments, the communication subsystem may allow the computing system 700 to send and receive messages with other devices over a network such as the Internet.
[0053] The following paragraphs provide further explanation of the subject matter of this disclosure. One example provides a computing system configured to implement a social media network, the computing system comprising one or more processors and a memory device containing instructions, the instructions being executable by the one or more processors to receive user input, which includes a natural language description of a request for an action on a content item, from a front-end interaction agent configured to engage with users of the social media network using at least a language model; generate a prompt for the language model based at least the user input; input the prompt into the language model to generate a language model output describing one or more operations for implementing the action; implement the one or more operations by calling a back-end service of the social media network via a front-end interface to execute one or more commands; and output the result of executing the one or more commands. In such an example, the request for the action on the content item may include a request for editing the content item, the one or more operations may include one or more editing operations for editing the content item, and the result of executing the one or more commands may include an edited version of the content item. In such an example, the computing system may instead or further include instructions that can be executed to identify one or more tools that can be called in the backend service to edit the content item based on the one or more editing operations, each of which is a tool command for the corresponding tool among the one or more tools.In such an example, the computing system may instead or further provide executable instructions for each of the one or more tool commands to determine whether the tool command is registered in a tool command whitelist for the corresponding tool, which includes a subset of tool commands for the corresponding tool, and for each tool command in the tool command whitelist to invoke the backend service to execute the tool command. In such an example, the computing system may instead or further provide executable instructions for outputting a natural language response from the conversational agent to the user based on the edited version of the content item. In such an example, the computing system may instead or further provide executable instructions for filtering the natural language response via either or both of sensitive word detection and intent detection. In such an example, the computing system may instead or further provide executable instructions for filtering out non-edit requests that are not intended for editing the content item from the user input to form filtered user input, and the instructions executable for generating the prompt for the language model may be executable for generating the prompt based on the filtered user input. In such an example, the content item may include video content, and the computing system may instead or further provide an executable instruction to share the edited version of the video content on the social media network in response to receiving user input to share the edited version of the video content. In such an example, the prompt to the language model may instead or further not include data about the user. In such an example, the one or more operations and the one or more commands may instead or further be generated by the language model.In such an example, the one or more operations may instead or further include adding an asset to the content item, the asset including one or more of effects, filters, stickers, music, emojis, avatars, or text, and the computing system may instead or further include executable instructions to call the backend service to recommend the asset to the user based on a feature vector representing the user's level of engagement with the social media network. In such an example, the computing system may instead or further include generating a first tool command for operations on the content item and a second tool command for operations on the content item based on the language model output, and include executable instructions to compare the first and second tool commands with the permission levels associated with the user account of the user on the social media network, and to execute the first tool command in the backend service if it is determined that the first tool command is permitted by the permission level, and to prevent the execution of the second tool command in the backend service if it is determined that the second tool command is not permitted by the permission level.
[0054] Another example provides a method for implementing a social media network, the method comprising: receiving user input from a front-end conversational agent configured to engage in interaction with users of the social media network using at least a language model, including a natural language description of a request for an action on a content item; generating a prompt for the language model based at least the user input; inputting the prompt into the language model to generate a language model output describing one or more operations for implementing the action; implementing the one or more operations by calling a back-end service of the social media network via a front-end interface to execute one or more commands; and outputting the result of executing the one or more commands. In such an example, the request for the action on the content item may include a request for editing the content item, the one or more operations may include one or more editing operations for editing the content item, and the result of executing the one or more commands may include an edited version of the content item. In such an example, the method may instead or further include identifying one or more tools that can be invoked in the backend service to edit the content item based on the one or more editing operations, each of the one or more commands being a tool command of the corresponding tool among the one or more tools. In such an example, the method may instead or further include, for each of the one or more tool commands, determining whether the tool command is registered in a tool command whitelist for the corresponding tool, which includes a subset of tool commands for the corresponding tool, and for each tool command in the tool command whitelist, invoking the backend service to execute the tool command.In such an example, the method may instead or further include outputting a natural language response from the conversational agent to the user based on the edited version of the content item. In such an example, the method may further include instead or further generating a first tool command for operating on the content item and a second tool command for operating on the content item based on the language model output; comparing the first and second tool commands to the permission levels associated with the user account of the user on the social media network; executing the first tool command in the backend service in response to a determination that the first tool command is permitted by the permission level; and preventing the execution of the second tool command in the backend service in response to a determination that the second tool command is not permitted by the permission level. Another example provides a non-temporary computer-readable medium comprising computer-readable instructions that, when executed by a computing device, cause the computing device to implement the method of the above example.
[0055] Another example provides a computing system configured to implement a social media network, the computing system comprising one or more processors and a memory device containing instructions, the instructions being executable by the one or more processors to receive user input via the user interface from a front-end conversational agent configured to engage with users of the social media network via a user interface using at least a language model, user input including a natural language description of a request for an action on a content item, filter the user input to form filtered user input, generate a prompt for the language model based at least the filtered user input, input the prompt into the language model to generate a language model output describing one or more operations for implementing the action, implement the one or more operations by calling a back-end service of the social media network via the front-end interface to execute one or more commands, output the result of executing the one or more commands, devise a natural language response from the conversational agent to the user based on the result, filter the natural language response to form filtered natural language response, and output the filtered natural language response via the user interface.
[0056] It will be understood that the configurations and / or approaches described herein are essentially illustrative and are subject to numerous modifications, and therefore these specific embodiments or examples should not be considered restrictively. The specific routines or methods described herein may represent one or more of any number of processing strategies. For this reason, the illustrated and / or described operations may be performed in parallel, in any other order, or omitted, in the order illustrated and / or described. Similarly, the order of the processes described above may be changed.
[0057] The subject matter of this disclosure includes novel and non-obvious combinations and subcombinations of the various processes, systems, configurations, and other features, functions, operations, and / or characteristics disclosed herein, as well as all equivalents thereof.
Claims
1. A computing system configured to implement a social media network, One or more processors, A memory device containing instructions, The aforementioned instruction is, A front-end conversational agent configured to engage in interaction with users of the social media network using at least a language model receives user input, including a natural language description of a request for an action on a content item. Based at least the user input, a prompt for the language model is generated, The prompt is input to the language model to generate a language model output that describes one or more operations for implementing the action. The one or more operations are implemented by calling the backend service of the social media network via the frontend interface and executing one or more commands, A computing system that can be executed by one or more processors to output the results of executing one or more of the aforementioned commands.
2. The request for the action on the content item includes a request to edit the content item. The aforementioned one or more operations include one or more editing operations for editing the content item, The computing system according to claim 1, wherein the result of executing one or more of the aforementioned commands includes an edited version of the content item.
3. The computing system according to claim 2, further comprising instructions that can be executed to identify one or more tools that can be called in the backend service to edit the content item based on the one or more editing operations, each of the one or more commands being a tool command for a corresponding tool among the one or more tools.
4. For each of the one or more tool commands mentioned above, it is determined whether the tool command is registered in the tool command whitelist for the corresponding tool, which includes a subset of the tool commands for the corresponding tool. The computing system according to claim 3, further comprising executable instructions for each tool command in the tool command whitelist to call the backend service to execute the tool command.
5. The computing system according to claim 2, further comprising an instruction executable from the front-end dialogue agent to output a natural language response to the user based on the edited version of the content item.
6. The computing system according to claim 5, further comprising instructions that can be executed to filter the natural language response via either or both of sensitive word detection and intent detection.
7. The computing system according to claim 2, further comprising instructions executable to filter out non-editing requests that are not intended for editing the content items from the user input to form filtered user input, wherein the instructions executable to generate the prompt for the language model are executable to generate the prompt based on the filtered user input.
8. The computing system according to claim 2, wherein the content item includes video content, and further comprises an instruction executable to share the edited version of the video content on the social media network in response to receiving user input to share the edited version of the video content.
9. The computing system according to claim 1, wherein the prompt to the language model does not include data relating to the user.
10. The computing system according to claim 1, wherein the one or more operations and the one or more commands are generated by the language model.
11. The one or more operations described above include adding an asset to the content item, the asset including one or more of the following: effects, filters, stickers, music, emojis, avatars, or text. The computing system according to claim 1, further comprising instructions that can be executed to call the backend service to recommend the asset to the user based on a feature vector representing the degree of user engagement with the social media network.
12. Based on the language model output, a first tool command for performing an operation on the content item and a second tool command for performing an operation on the content item are generated. The first tool command and the second tool command are compared with the permission levels associated with the user account of the user on the social media network. In response to the determination that the first tool command is permitted at the privilege level, the first tool command is executed in the backend service. The computing system according to claim 1, further comprising an instruction executable in the backend service to prevent the execution of the second tool command in response to a determination that the second tool command is not permitted at the privilege level.
13. A method for implementing a social media network, Receiving user input, including a natural language description of a request for an action on a content item, from a front-end conversational agent configured to engage in interaction with users of the social media network using at least a language model, At least based on the user input, generate a prompt for the language model, The process involves inputting the prompt into the language model to generate a language model output that describes one or more operations for implementing the action, Implementing the one or more operations by calling the backend service of the social media network via the frontend interface and executing one or more commands, A method comprising outputting the result of executing one or more of the aforementioned commands.
14. The request for the action on the content item includes a request to edit the content item. The aforementioned one or more operations include one or more editing operations for editing the content item, The method according to claim 13, wherein the result of executing one or more of the aforementioned commands includes an edited version of the content item.
15. The method according to claim 14, further comprising identifying one or more tools that can be invoked in the backend service to edit the content item based on the one or more editing operations, wherein each of the one or more commands is a tool command of the corresponding tool among the one or more tools.
16. For each of the one or more tool commands mentioned above, it is determined whether the tool command is registered in the tool command whitelist for the corresponding tool, which includes a subset of the tool commands for the corresponding tool. The method according to claim 15, further comprising calling the backend service to execute each tool command in the tool command whitelist.
17. The method according to claim 14, further comprising outputting a natural language response to the user from the front-end dialogue agent based on the edited version of the content item.
18. Based on the language model output, a first tool command for performing an operation on the content item and a second tool command for performing an operation on the content item are generated. The first tool command and the second tool command are compared with the permission levels associated with the user account of the user on the social media network, In response to the determination that the first tool command is permitted at the privilege level, the first tool command is executed in the backend service, The method according to claim 13, further comprising preventing the execution of the second tool command in the backend service in response to a determination that the second tool command is not permitted at the privilege level.
19. A computing system configured to implement a social media network, One or more processors, A memory device containing instructions, The aforementioned instruction is, The system receives user input, including a natural language description of a request for an action on a content item, from a front-end conversational agent configured to engage with users of the social media network via a user interface, using at least a language model, By filtering the user input, a filtered user input is formed. Based at least the filtered user input, a prompt for the language model is generated, The prompt is input to the language model to generate a language model output that describes one or more operations for implementing the action. The one or more operations are implemented by calling the backend service of the social media network via the frontend interface and executing one or more commands, Output the result of executing one or more of the aforementioned commands. Based on the above results, a natural language response from the front-end dialogue agent to the user was devised. By filtering the aforementioned natural language response, a filtered natural language response is formed. A computing system that can be executed by one or more processors to output the filtered natural language response via the user interface.
20. A non-temporary computer-readable medium comprising a computer-readable instruction that, when executed by a computing device, causes the computing device to implement the method according to claim 13.