An information processing system

By acquiring and extracting the feature information of the information to be processed, the target information for processing is automatically determined, which solves the problem of lack of intelligent data processing in the existing technology and achieves more efficient and accurate information processing.

CN122241165APending Publication Date: 2026-06-19SHENZHEN TCL HIGH TECH DEVELOPMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN TCL HIGH TECH DEVELOPMENT CO LTD
Filing Date
2024-12-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, data processing methods rely on manual selection by users and lack automatic perception and intelligent processing mechanisms for data characteristics.

Method used

By acquiring the feature information of the information to be processed, performing feature extraction, obtaining the target feature information, and determining the target processing information of the information to be processed based on the target feature information, automated processing is achieved by using the data acquisition module, feature extraction module, and information determination module.

Benefits of technology

It has improved the level of intelligence in data processing, reduced manual operations by users, and enhanced the efficiency and accuracy of information processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241165A_ABST
    Figure CN122241165A_ABST
Patent Text Reader

Abstract

This application discloses an information processing system that extracts features from information to be processed to obtain target feature information, and determines the target processing information of the information to be processed based on the target feature information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This specification relates to the field of computer technology, specifically to an information processing system. Background Technology

[0002] With the development of technology, users often need to dynamically process or optimize data according to their specific needs when using various types of data. For example, depending on the different characteristics of the content, it may be necessary to adjust the data processing method to improve efficiency or meet personalized needs. However, in existing technologies, data processing methods usually rely on manual selection by users and lack automatic perception and intelligent processing mechanisms for data characteristics. Summary of the Invention

[0003] This application provides an information processing system.

[0004] In a first aspect, this application provides a method comprising:

[0005] Obtain the unprocessed feature information of the information to be processed;

[0006] Feature extraction is performed on the feature information to be processed to obtain the target feature information;

[0007] Based on the target feature information, determine the target processing information of the information to be processed.

[0008] Secondly, this application provides a system comprising:

[0009] The data acquisition module is used to acquire the feature information to be processed from the information to be processed.

[0010] The feature extraction module is used to extract features from the feature information to be processed to obtain target feature information;

[0011] The information determination module is used to determine the target processing information of the information to be processed based on the target feature information.

[0012] Thirdly, this application provides an apparatus, the apparatus comprising:

[0013] One or more processors;

[0014] Memory; and

[0015] One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the method described in any one of the first aspects.

[0016] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the operations in the method described in any one of the first aspects. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of this specification, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 The following are schematic diagrams illustrating application scenarios of the information processing system provided by some embodiments of this application;

[0019] Figure 2 A flowchart illustrating an information processing method provided in some embodiments of this application is shown;

[0020] Figure 3 The following is a flowchart illustrating the process of determining target feature information provided in some embodiments of this application;

[0021] Figure 4 The following is a flowchart illustrating the process of determining target processing information provided in some embodiments of this application;

[0022] Figure 5 The following is a schematic flowchart illustrating the process of determining first feature information provided in some embodiments of this application;

[0023] Figure 6 The diagram illustrates a sliding window processing method provided in some embodiments of this application;

[0024] Figure 7 This application illustrates an information processing system provided by some embodiments;

[0025] Figure 8 A schematic diagram of the structure of a computer device provided in some embodiments of this application is shown. Detailed Implementation

[0026] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0027] In the description of this application, it should be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships based on the orientation or positional relationships shown in the accompanying drawings, are used only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this application. Furthermore, the terms "first," "second," and "third," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, features defined with "first," "second," "third," etc., may explicitly or implicitly include one or more of the stated features. In the description of this application, "a plurality of" means two or more, unless otherwise explicitly specified.

[0028] In this application, the term "exemplary" is used to mean "used as an example, illustration, or description." Any embodiment described as "exemplary" in this application is not necessarily to be construed as being more preferred or advantageous than other embodiments. The following description is provided to enable any person skilled in the art to make and use this application. Details are set forth in the following description for purposes of explanation. It should be understood that those skilled in the art will recognize that this application can be made without using these specific details. In other instances, well-known structures and processes are not described in detail to avoid obscuring the description of this application with unnecessary detail. Therefore, this application is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed in this application.

[0029] It should be noted that the methods provided in some embodiments of this application are executed in computer devices, and the processing objects of each computer device exist in the form of data or information, such as time, which is essentially time information. It is understood that if size, quantity, position, etc. are mentioned in subsequent embodiments, they are all corresponding data that exist so that the computer device can process them. Specific details will not be elaborated here.

[0030] This application provides an information processing method, system, device, and storage medium, which will be described in detail below.

[0031] Figure 1 The illustrations show application scenarios of the information processing system provided by some embodiments of this application. The information processing methods provided by this application can be applied to, for example... Figure 1 In the application scenarios shown. For example... Figure 1As shown, the application scenario may include user 101, terminal 102, and information processing system 103.

[0032] User 101 can be any user who obtains information to be processed from terminal 102. For example, user 101 can be a user who uses terminal 102 to play a video. For example, user 101 can be a user who uses terminal 102 to play audio. For yet another example, user 101 can be a user who uses terminal 102 to play an image.

[0033] Terminal 102 can be a device that interacts with user 101. In some embodiments, terminal 102 may include hardware devices with data processing capabilities and the necessary application programs to drive the hardware devices. The application programs provide user 101 with the ability to interact with the outside world via a network and an interface. User 101 can obtain information to be processed through the interactive interface of terminal 102. For example, user 101 can watch or listen to played content through the interactive interface of terminal 102. User 101 can also adjust playback information, such as playback speed, through the interactive interface of terminal 102.

[0034] In some embodiments, terminal 102 may include a mobile device, tablet computer, laptop computer, built-in device of a motor vehicle, or similar content, or any combination thereof. In some embodiments, the mobile device may include a smart home device, smart mobile device, virtual reality device, augmented reality device, or similar device, or any combination thereof. In some embodiments, the smart home device may include a smart TV, desktop computer, etc., or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, personal digital assistant, gaming device, navigation device, etc., or any combination thereof. In some embodiments, the built-in device in the motor vehicle may include an in-vehicle computer, in-vehicle TV, etc.

[0035] The information processing system 103 can be a backend server providing services to the terminal 102. The information processing system 103 can communicate with the terminal 102. The information processing system 103 can receive user instructions or data through the terminal 102. The information processing system 103 can store data or instructions for executing the information processing method provided in this application, and can execute or be used to execute the data or instructions. For example, the information processing system 103 can acquire the feature information to be processed, perform feature extraction on the feature information to be processed to obtain target feature information, and determine the target processing information of the information to be processed based on the target feature information.

[0036] In some embodiments, the information processing system 103 may store information to be processed, a model for feature extraction, target feature information, and target processing information. In other embodiments, the information to be processed may also be stored in the terminal 102, and the information processing system 103 may obtain the information to be processed from the terminal 102 and perform subsequent processing.

[0037] In other embodiments, the information processing system 103 may include a terminal 102. For example, the computing device of the information processing system 103 and the terminal 102 may be the same device.

[0038] The information processing system 103 can be a single computing device or a cluster system composed of multiple computing devices. The data or instructions stored in the information processing system 103 for executing the information processing methods described in this application can adopt any form of system architecture. For example, layered architecture, event-driven architecture, microkernel architecture, microservice architecture, or cloud architecture, etc.

[0039] It should be noted that, Figure 1 The illustrated application scenario diagram of the information processing system is merely an example. The information processing system and scenario described in the embodiments of this application are intended to more clearly illustrate the technical solutions of the embodiments of this application and do not constitute a limitation on the technical solutions of this application. Figure 1 The number of users 101, terminals 102, and information processing systems 103 shown are exemplary and can be any number depending on implementation needs. Those skilled in the art will recognize that, with the evolution of information processing systems and the emergence of new application scenarios, the technical solutions provided in this application are also applicable to similar technical problems.

[0040] Figure 2 A flowchart illustrating an information processing method provided in some embodiments of this application is shown. Although a logical sequence is shown in the flowchart, in some cases, the operations shown or described may be performed in a different order than that shown in the figures. The information processing method provided in this embodiment may include operations S201 to S203, as follows:

[0041] S201. Obtain the feature information to be processed from the information to be processed.

[0042] Optionally, the computer device can obtain the feature information to be processed from the local storage area, or it can obtain it from other devices via network, Bluetooth, or other means. The information to be processed can be an information source used for analysis and processing. The information to be processed can be information of any form / modality. For example, the information to be processed can be any one or a combination of several types of data, such as text data, image data, and audio data. Image data refers to the set of grayscale values ​​of each pixel represented numerically; image data can be static image data or dynamic video data.

[0043] The feature information to be processed includes multimodal information, which can be information extracted from the information to be processed. For example, the feature information to be processed in video format may include image information, audio information, and / or text information. Image information may include multiple video frame data, audio information may include audio data, and text information may include subtitle data and / or text identifiers contained in the image information. It is understood that some video format information may not include text information or audio information. For example, the feature information to be processed in audio format may include audio information. Similarly, the feature information to be processed in image format may include image information.

[0044] S202. Extract features from the feature information to be processed to obtain the target feature information.

[0045] The target feature information is the information obtained by feature extraction of the feature information to be processed. The target feature information includes at least one target feature value. Each target feature value corresponds to a data stream position of the information to be processed. Each target feature value represents the density of information contained in the data stream position corresponding to the target feature value in the information to be processed (i.e., information density). Information density refers to the proportion of target information in a unit of content or time. The target information can be a type of information that is manually or pre-identified according to certain rules.

[0046] For example, for information to be processed in video format, the feature information contained in the video can include text, image, and audio information. Target feature information can be obtained based on the text, image, and audio information in the video. This target feature information can include multiple target feature values. The data stream position refers to the position information of the corresponding data frame of the information to be processed. This position information is determined by the preceding and following frames of data; that is, it is a relative positional concept. This data stream position is used to characterize the relative position of the corresponding data frame among all sequentially arranged data frames. Each target feature value can characterize the information density at different points in time in the video.

[0047] Figure 3 This paper illustrates a flowchart of determining target feature information according to some embodiments of this application. In some embodiments, feature extraction is performed on the feature information to be processed to obtain target feature information, which may include operations S301 to S302:

[0048] S301. Extract features from the multimodal information to obtain the corresponding first feature information.

[0049] As mentioned earlier, the information to be processed can include multimodal information, which refers to information acquired or represented through multiple sensory modes or signal sources. A modality refers to different channels through which humans or devices perceive information, such as sensory modes like vision, hearing, and touch, as well as data representation forms like text, images, audio, and video. The multimodal information included in the information to be processed means that it can include information from multiple different modalities. For example, the information to be processed can simultaneously contain text data and image data, or it can be data information of any modality. Multiple modal information can include, but is not limited to, image information, audio information, and text information. Feature extraction from multimodal information can extract corresponding first feature information from one or more modalities.

[0050] In some embodiments, the first feature information can be used to characterize the information density of multimodal information. For example, feature extraction of image information can yield the corresponding image information density. Similarly, feature extraction of audio information can yield the corresponding audio information density. Furthermore, feature extraction of text information can yield the corresponding text information density. Taking video data as an example, the first feature information may include image information density, audio information density, and text information density.

[0051] Information density can represent the amount of information contained in a unit of time. In some embodiments, the first feature information can be represented using a first information density sequence. The first information density sequence includes one or more information density values ​​arranged in chronological order. Each information density value can be used to represent the information density within a certain time length. For example, the first feature information corresponding to the j-th modality information can be represented as follows: in, Let represent the i-th information density value in the first information density sequence corresponding to the j-th modal information, and k represent the length of the first information density sequence.

[0052] How to extract features from multiple modalities and obtain the corresponding first feature information will be described in detail later.

[0053] S302. Based on the first feature information, obtain the target feature information.

[0054] As mentioned earlier, the first feature information can be used to characterize the information density of multimodal information. Target feature information can be represented using a target information density sequence. For example, the target information density sequence can be represented as {A1, A2, ..., A...} k}, where A i represents the i-th information density value in the target information density sequence, and k represents the length of the target information density sequence.

[0055] In some embodiments, obtaining target feature information based on first feature information may be achieved by weighted summation of multiple first information density sequences to obtain a target information density sequence. In other embodiments, obtaining target feature information based on first feature information may be achieved by determining a specific information density value from multiple first information densities as a target feature value.

[0056] How to obtain target feature information based on the first feature information will be described in detail later.

[0057] The information processing method provided in some embodiments of this application can extract features from multimodal information to obtain corresponding first feature information, and obtain target feature information based on the first feature information. By extracting features from multimodal information, the completeness of feature extraction is improved, which is beneficial to improving the accuracy of target feature information.

[0058] S203. Based on the target feature information, determine the target processing information of the information to be processed.

[0059] In this embodiment, target processing information can be used to characterize the target processing information of the information to be processed at different data stream locations. For example, for the information to be processed in the form of video, the target processing information can be the target playback speed of the video. In this embodiment, target feature information is obtained by extracting features from the feature information to be processed, and the target processing information of the information to be processed is determined based on the target feature information. The target processing information at the data stream location can be automatically determined based on the density of information contained in different data stream locations of the information to be processed. For example, target feature information can be obtained by extracting features from one or more types of information such as text information, image information, and audio information in the video, and then the video playback speed can be adjusted in real time and accurately based on the target feature information.

[0060] Figure 4 The diagram illustrates a flowchart of determining target processing information according to some embodiments of this application. In some embodiments, determining the target processing information of the information to be processed based on target feature information may include operations S401 to S402:

[0061] S401. Obtain baseline feature information.

[0062] The baseline feature information is used to characterize the expected value of the target feature value in the target feature information. The baseline feature information can be the user-acceptable information density obtained based on user usage habits. In some embodiments, the baseline feature information can be used to characterize the baseline information density. Taking video as an example, if the information density of the video during actual playback matches the baseline information density, the information processing efficiency of the information to be processed can be improved.

[0063] How to determine the baseline feature information will be described in detail later.

[0064] S402. Calculate and process the target feature value and the baseline feature information in the target feature information to obtain the target processing information of the information to be processed.

[0065] The target feature information may include at least one target feature value. When using a target information density sequence to represent the target feature information, at least one target feature value may correspond to at least one information density value in the target information density sequence. The calculation and processing of the target feature value and the reference feature information in the target feature information may be the calculation and processing of the information density value in the target information density sequence and the reference information density value.

[0066] In some embodiments, the target processing information may be a playback speed value, a playback speed sequence, or a playback speed value sequence of the information to be processed. The playback speed value represents a multiple relative to a playback speed reference value. The playback speed can be determined by the product of the playback speed reference value and the playback speed value, where the playback speed reference value is the default playback speed under normal playback conditions. The playback speed value sequence may be: Where E represents the reference information density, A i represents the i-th information density value in the target information density sequence, and k represents the length of the target information density sequence.

[0067] Some embodiments of this application provide information processing methods that can acquire baseline feature information and calculate and process the target feature values ​​in the target feature information with the baseline feature information to obtain target processing information for the information to be processed. Since the baseline feature information is based on user habits (e.g., determined based on user historical settings), for example, information density that conforms to user habits, obtaining target processing information based on the baseline feature information can improve the information processing efficiency of the information to be processed while satisfying user habits.

[0068] In summary, some embodiments of this application provide an information processing method that can acquire the feature information to be processed from the information to be processed; extract features from the feature information to be processed to obtain target feature information; and determine the target processing information of the information to be processed based on the target feature information. This method can improve the level of intelligence in determining the target processing information.

[0069] Next, we will describe in detail how to extract features from multimodal information to obtain the corresponding first feature information.

[0070] Figure 5 This illustration shows a flowchart of determining first feature information according to some embodiments of this application. In some embodiments, operation S301, which extracts features from multiple modal data to obtain corresponding multiple first feature information, may include operations S501 to S502:

[0071] S501. Perform first feature extraction on the multimodal information to obtain a first feature representation corresponding to the multimodal information. In some embodiments, the multimodal information includes one or more of text information, image information, or audio information. Taking the first feature extraction of text information to obtain the corresponding first feature representation as an example, operation S501 may include:

[0072] (1) Preprocess the text information to obtain the first preprocessed information corresponding to the text information.

[0073] The preprocessing of text information may include performing sliding window processing on the text information to obtain multiple window data corresponding to the text information. The first preprocessed information corresponding to the text information includes multiple window data. Figure 6 Schematic diagrams of sliding window processing provided by some embodiments of this application are shown. For example... Figure 6 As shown, a sliding window allows for the sliding and cropping of text information, with each sliding window containing a segment of data. The time interval between adjacent sliding windows is the sliding step size. To ensure that no text information is missed during cropping, the sliding step size can be set to be smaller than the length of the sliding window.

[0074] (2) Input the first preprocessed information into the first processing model to obtain the first feature representation corresponding to the text information.

[0075] The first feature can be used to describe the content of the text information. In some embodiments, the first feature can be a description of the text information.

[0076] The first processing model can be a trained multimodal model. In some embodiments, a training dataset can be constructed to fine-tune the pre-trained model, wherein the pre-trained model can be a multimodal pre-trained model pre-trained on a massive dataset. The training dataset includes training sample pairs of multiple modalities, each training sample pair including a training sample and a text label corresponding to the training sample. The text label can be manually annotated descriptive text about the training sample, and can include the content contained in the training sample and the variations in the content.

[0077] Taking sliding window processing as an example of preprocessing, for image information, each training sample pair may include image window data and the corresponding text label. For audio information, each training sample pair may include audio window data and the corresponding text label. For text information, each training sample pair may include text window data and the corresponding text label.

[0078] The multimodal pre-trained model is tuned using the constructed training dataset. Tuning methods include, but are not limited to, full parameter tuning and partial parameter tuning. Partial parameter tuning can employ address adapter (Low-Rank Adaptation, LoRA) or other variations. Training stops when the tuning of the multimodal pre-trained model reaches a preset stopping condition, resulting in the trained first processing model.

[0079] Understandably, for information from other modalities, the corresponding first feature representation can be obtained using a similar method. For example, image information can be preprocessed to obtain second preprocessed information, and this second preprocessed information can be input into a first processing model to obtain the first feature representation of the image information. Similarly, audio information can be preprocessed to obtain third preprocessed information, and this third preprocessed information can be input into a first processing model to obtain the first feature representation of the audio information. How to obtain the corresponding first feature representation for information from other modalities will not be elaborated here.

[0080] Multiple window data corresponding to each modality are sequentially input into the first processing model, and descriptive text information corresponding to each window data is output.

[0081] In some embodiments of this application, the information processing method involves extracting a first feature from multimodal information to obtain a first feature representation corresponding to the multimodal information. This may include preprocessing information from multiple modalities and obtaining the corresponding first feature representation based on the preprocessed information. The first feature representation may be descriptive text information. Preprocessing, such as sliding window processing, allows each window of data to contain more contextual information, which helps improve the accuracy of the generated descriptive text information.

[0082] S502. Perform second feature extraction on the first feature representation to obtain the first feature information.

[0083] In some embodiments, operation S502 may include:

[0084] (1) Determine multiple corresponding second feature information based on the first feature representation. Each second feature information includes at least one first feature value.

[0085] The first feature can represent descriptive text information corresponding to at least one modality. Based on the descriptive text information, the information richness, or information density, in the descriptive text information can be determined. In some embodiments, the first feature value can be the information density, which can be obtained based on the number of characters in the descriptive text information. For example, the number of characters can be used as the numerical value of the information density.

[0086] The number of characters in the descriptive text information corresponding to a window of data can be used to determine a first feature value. This first feature value can then characterize the information density of the corresponding window of data. Based on multiple first feature values ​​for each modality, corresponding second feature information is determined.

[0087] In some embodiments, the second feature information can be used to characterize the initial information density of multimodal information. The second feature information can be represented using a second information density sequence. The second information density sequence may include one or more first feature values ​​arranged in chronological order, each first feature value representing the information density of the corresponding windowed data. For example, the second feature information corresponding to the j-th modality can be represented as follows: in, Let represent the i-th first feature value in the second information density sequence corresponding to the j-th modal data, and k represent the length of the second information density sequence, which is also the number of windowed data corresponding to the j-th modal information.

[0088] (2) Determine the corresponding first feature information based on the first feature value in each second feature information.

[0089] Multiple primary feature information pieces are used to determine target feature information, which in turn determines target processing information, such as the playback speed of the information to be processed. Therefore, the data fluctuations of each primary feature information should not be too frequent. Otherwise, the data fluctuations in the target processing information will also be too frequent, leading to a poor user experience.

[0090] In some embodiments, determining the corresponding first feature information based on the first feature value in each second feature information may include:

[0091] The first feature value in each second feature information is grouped to obtain multiple third feature information;

[0092] The first feature values ​​in each third feature information are sequentially adjacent, and the difference between any two first feature values ​​in each third feature information is less than a first threshold; and

[0093] The first feature value in each third feature information is fused to obtain multiple corresponding first feature information.

[0094] The first feature value in each second feature information is grouped to obtain multiple third feature information. For example, the second feature information corresponding to the j-th modality data can be represented as follows after feature grouping: in, Let represent the i-th first feature value in the second information density sequence corresponding to the j-th modality data, and k represent the length of the second information density sequence, which is also the number of windowed data corresponding to the j-th modality data. At least one first feature value within the same brackets represents a third feature information. The difference between any two first feature values ​​in each third feature information is less than a first threshold; in other words, the fluctuation within each third feature information is small.

[0095] The fusion process for the first feature value in each third feature information can include averaging or weighted averaging the first feature values ​​in each third feature information. For example, after fusion using the average value, the resulting first feature information could be...

[0096] In other embodiments, the maximum value or any value of the first feature value in each third feature information can be taken to obtain a plurality of corresponding first feature information.

[0097] Some embodiments of this application provide an information processing method that can determine multiple corresponding second feature information based on a first feature representation, and obtain first feature information based on the second feature information. Furthermore, some embodiments of this application can group the first feature values ​​in the second feature information, and then perform fusion processing on the first feature values ​​in each group. This method, through the fusion processing of the first feature values, makes the changes in the obtained first feature information relatively gradual.

[0098] In summary, the information processing method provided in this application can extract first features from multimodal information to obtain corresponding first feature representations, and determine first feature information based on the first feature representations. This method can evaluate the richness of information contained in multimodal information using first feature representations, such as those describing textual information.

[0099] Next, we will explain in detail how to obtain the target feature information based on the first feature information. Operation S302 may include:

[0100] (1) Determine the similarity between the first feature representations corresponding to the multimodal information.

[0101] For example, multimodal information can include text information, image information, and / or audio information. The similarity between multiple modalities can be determined by calculating the similarity between the first feature representations corresponding to the information of each modality. These first feature representations can be descriptive text information corresponding to the text, image, and / or audio information, respectively. The similarity between the first feature representations corresponding to the information of each modality can be obtained using semantic similarity algorithms.

[0102] (2) When the similarity is lower than the second threshold, the maximum value of at least one first feature value corresponding to each data stream position is determined as the target feature value corresponding to each data stream position.

[0103] The second threshold can be a preset similarity threshold. When the similarity is below the second threshold, the correlation between the information from multiple modalities is considered low, and the largest value among the first feature values ​​corresponding to the information from multiple modalities can be used as the target feature value. For example, for video information to be processed, a first information density sequence corresponding to each modal information can be obtained, with information density values ​​corresponding to multiple modal information at each time point. The maximum value among the multiple information density values ​​at each time point is taken as the target feature value corresponding to that time point.

[0104] (3) When the similarity is greater than or equal to the second threshold, at least one first feature value corresponding to each data stream position is calculated and processed to obtain the target feature value corresponding to each data stream position.

[0105] If the similarity is greater than or equal to the second threshold, it can be considered that the information of multiple modalities is highly correlated, and the information of multiple modalities can be combined to determine the target feature value.

[0106] In some embodiments, calculating at least one first feature value corresponding to each data stream location can be performed by weighted summation of multiple first information density sequences to obtain a target information density sequence. The weighted summation method is as follows: in, β represents the i-th information density value in the j-th first information density sequence. j Let J represent the weight coefficient corresponding to the j-th first information density sequence, and J represent the number of first information density sequences, which is also the number of first feature information.

[0107] In other embodiments, the calculation of at least one first feature value corresponding to each data stream position can be performed by averaging multiple first information density sequences to obtain the target information density sequence.

[0108] (4) Determine the target feature information based on the target feature value corresponding to each data stream location.

[0109] Once the target feature value corresponding to each data stream location is determined, the target feature information can be composed of multiple target feature values.

[0110] The information processing method provided in some embodiments of this application can flexibly determine the target feature value corresponding to each data stream position based on the similarity between the first feature information corresponding to the multimodal information, so that the determination of the target feature value is more accurate.

[0111] Next, we will explain in detail how to determine the baseline information.

[0112] In some embodiments, the reference feature information can be determined as follows:

[0113] Obtain the fourth feature information corresponding to the first reference data, wherein the fourth feature information includes at least one second feature value; perform calculation processing on the second feature value in the fourth feature information to obtain a third feature value; and perform calculation processing on the third feature value and the reference feature value to obtain the benchmark feature information.

[0114] The first reference data can be historical data processed using reference feature values, which can be feature values ​​determined based on user behavior. For example, the first reference data can be data with a global speed-up value set. The global speed-up value means that the same speed-up value is used throughout the entire time length of the first reference data. The reference feature value can be a preset playback parameter in the first reference data. For example, the reference feature value can be the global speed-up value set for the first reference data. The fourth feature information corresponding to the first reference data can be determined according to the method for obtaining target feature information described above, and will not be repeated here. Calculating the second feature value in the fourth feature information can include averaging the second feature value in the fourth feature information to obtain the third feature value. Calculating the third feature value and the global speed-up value can be done by multiplying the third feature value and the global speed-up value to obtain the baseline feature information.

[0115] In some embodiments, the third feature value may be the average information density value of the first reference data, and the benchmark feature information may be the benchmark information density.

[0116] In other embodiments, the reference feature information can also be determined in the following manner:

[0117] The fifth feature information corresponding to the second reference data and the first processing information corresponding to the second reference data are obtained; and the fifth feature information and the first processing information are analyzed and processed to obtain the benchmark feature information.

[0118] The second reference data can be historical data processed using the first processing information, which is processing information determined based on user behavior. For example, the second reference data could be video data whose playback speed was adjusted during playback. The fifth feature information corresponding to the second reference data can be determined using the method for obtaining target feature information described above, and will not be repeated here.

[0119] The fifth feature information may include at least one fourth feature value and a first time value corresponding to each fourth feature value, and the first control information may include at least one control parameter and a second time value corresponding to each control parameter. In some embodiments, the control parameter may be a playback speed value.

[0120] The fifth feature information and the first processing information are analyzed and processed to obtain the baseline feature information, including:

[0121] The fifth feature value is obtained by calculating and processing at least one fourth feature value and the first time value corresponding to each fourth feature value; the sixth feature value is obtained by calculating and processing at least one control parameter and the second time value corresponding to each control parameter; and the reference feature information is obtained by calculating and processing the fifth feature value and the sixth feature value.

[0122] In some embodiments, the fifth feature information and the first control information can be analyzed and processed in the following ways: S i Let t represent the i-th fourth eigenvalue. i K represents the first time value corresponding to the i-th fourth feature value. i R represents the playback speed value of the i-th iteration. i Let m represent the second time value corresponding to the i-th playback speed value, m represent the number of fourth feature values, and n represent the number of playback speed values.

[0123] In some embodiments, the fourth feature value may be the information density sequence of the second reference data, and the benchmark feature information may be the benchmark information density.

[0124] Some embodiments of this application provide information processing methods that can adaptively adjust the reference feature information to better meet the user's viewing needs.

[0125] Figure 7 This application illustrates an information processing system provided by some embodiments. The information processing system 700 includes:

[0126] The data acquisition module 710 is used to acquire the feature information to be processed from the information to be processed.

[0127] The feature extraction module 720 is used to extract features from the feature information to be processed, and obtain the target feature information.

[0128] The information determination module 730 is used to determine the target processing information of the information to be processed based on the target feature information.

[0129] The information processing system provided in some embodiments of this application can extract features from the feature information to be processed to obtain target feature information, and then determine the target processing information of the information to be processed based on the target feature information, thereby reducing manual operation by the user and improving the level of intelligence in determining the target processing information.

[0130] In some embodiments, the feature information to be processed includes multimodal information; the feature extraction module 720 extracts features from the feature information to be processed to obtain target feature information, including:

[0131] Feature extraction is performed on the multimodal information to obtain the corresponding first feature information;

[0132] Based on the first feature information, the target feature information is obtained.

[0133] In some embodiments, the feature extraction module 720 performs feature extraction on the multimodal information to obtain corresponding first feature information, including:

[0134] The first feature is extracted from the multimodal information to obtain the first feature representation corresponding to the multimodal information;

[0135] The first feature representation is subjected to second feature extraction to obtain the first feature information.

[0136] In some embodiments, the multimodal information includes at least one of text information, image information, or audio information;

[0137] The feature extraction module 720 performs first feature extraction on the multimodal information to obtain a first feature representation corresponding to the multimodal information, including:

[0138] The text information is preprocessed to obtain first preprocessed information corresponding to the text information, and the first preprocessed information is input into a first processing model to obtain a first feature representation corresponding to the text information; and / or

[0139] The image information is preprocessed to obtain second preprocessed information corresponding to the image information, and the second preprocessed information is input into the first processing model to obtain the first feature representation corresponding to the image information; and / or

[0140] The audio information is preprocessed to obtain the third preprocessed information corresponding to the audio information, and the third preprocessed information is input into the first processing model to obtain the first feature representation corresponding to the audio information.

[0141] In some embodiments, the feature extraction module 720 performs second feature extraction on the first feature representation to obtain first feature information, including:

[0142] Multiple second feature information corresponding to the first feature representation are determined, wherein each second feature information includes at least one first feature value;

[0143] The corresponding first feature information is determined based on the first feature value in each second feature information.

[0144] In some embodiments, the feature extraction module 720 determines the corresponding first feature information based on the first feature value in each second feature information, including:

[0145] The first feature value in each second feature information is grouped to obtain multiple third feature information; the first feature values ​​in each third feature information are sequentially adjacent, and the difference between any two first feature values ​​in each third feature information is less than a first threshold.

[0146] The first feature value in each third feature information is fused to obtain the corresponding first feature information.

[0147] In some embodiments, the first feature information includes at least one first feature information corresponding to the multimodal information, and each first feature information includes a plurality of first feature values ​​corresponding to a plurality of data stream locations;

[0148] The feature extraction module 720 obtains target feature information based on the first feature information, including:

[0149] Determine the similarity between the first feature representations corresponding to multimodal information;

[0150] If the similarity is below the second threshold, determine the maximum value among at least one first feature value corresponding to each data stream location as the target feature value corresponding to each data stream location, and / or

[0151] If the similarity is greater than or equal to the second threshold, at least one first feature value corresponding to each data stream position is calculated to obtain the target feature value corresponding to each data stream position; and

[0152] Target feature information is determined based on the target feature value corresponding to each data stream location.

[0153] In some embodiments, the target feature information includes at least one feature information, and the information determination module 730 determines the target processing information of the information to be processed based on the target feature information, including:

[0154] Obtain benchmark feature information, which is used to characterize the expected value of the target feature value in the target feature information;

[0155] The target feature values ​​and baseline feature information in the target feature information are calculated and processed to obtain the target processing information of the information to be processed.

[0156] In some embodiments, the reference feature information is determined as follows:

[0157] Obtain the fourth feature information corresponding to the first reference data, wherein the fourth feature information includes at least one second feature value, the first reference data is historical data processed using the reference feature value, and the reference feature value is a feature value determined based on user behavior;

[0158] The second feature value in the fourth feature information is calculated and processed to obtain the third feature value;

[0159] The third eigenvalue and the reference eigenvalue are calculated and processed to obtain the baseline eigenvalue information.

[0160] In some embodiments, the reference feature information is determined as follows:

[0161] Obtain the fifth feature information corresponding to the second reference data and the first processing information corresponding to the second reference data, wherein the second reference data is historical data processed using the first processing information, and the first processing information is processing information determined based on user behavior;

[0162] The fifth feature information and the first processing information are analyzed and processed to obtain the baseline feature information.

[0163] In some embodiments, the fifth feature information includes at least one fourth feature value and a first time value corresponding to each fourth feature value, and the first control information includes at least one control parameter and a second time value corresponding to each control parameter.

[0164] The information determination module 730 analyzes and processes the fifth feature information and the first processing information to obtain the baseline feature information, including:

[0165] The fifth feature value is obtained by calculating and processing at least one fourth feature value and the first time value corresponding to each fourth feature value.

[0166] The sixth characteristic value is obtained by calculating and processing at least one control parameter and the second time value corresponding to each control parameter.

[0167] The fifth and sixth eigenvalues ​​are calculated and processed to obtain the baseline feature information.

[0168] Some embodiments of this application provide a computer device that integrates any of the information processing systems provided in some embodiments of this application. The computer device includes:

[0169] One or more processors;

[0170] Memory; and

[0171] One or more applications, wherein the applications are stored in the memory and configured to be executed by the processor in the above-described information processing method embodiments.

[0172] Some embodiments of this application provide a computer device that integrates any of the information processing systems provided in some embodiments of this application. Figure 8 The diagram illustrates the structure of a computer device provided in some embodiments of this application. For example... Figure 8 As shown:

[0173] The computer device may include components such as a processor 801 with one or more processing cores, a memory 802 with one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will understand that... Figure 8The computer device structure shown does not constitute a limitation on the computer device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:

[0174] The processor 801 is the control center of the computer device. It connects various parts of the computer device via various interfaces and lines. By running or executing software programs and / or modules stored in the memory 802, and by calling data stored in the memory 802, it performs various functions of the computer device and processes data, thereby providing overall monitoring of the computer device. Optionally, the processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 801.

[0175] The memory 802 can be used to store software programs and modules. The processor 801 executes various functional applications and information processing by running the software programs and modules stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the computer device, etc. In addition, the memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 with access to the memory 802.

[0176] The computer device also includes a power supply 803 that supplies power to the various components. Preferably, the power supply 803 can be logically connected to the processor 801 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 803 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

[0177] The computer device may also include an input unit 804, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

[0178] Although not shown, the computer device may also include a display unit, etc., which will not be described in detail here. In this embodiment, the processor 801 in the computer device can load the executable files corresponding to the processes of one or more application programs into the memory 802 according to the following instructions, and the processor 801 runs the application programs stored in the memory 802 to realize various functions, as follows:

[0179] Obtain the unprocessed feature information of the information to be processed;

[0180] Feature extraction is performed on the feature information to be processed to obtain the target feature information;

[0181] Based on the target feature information, determine the target processing information of the information to be processed.

[0182] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.

[0183] Therefore, embodiments of this application provide a computer-readable storage medium, which may include: read-only memory (ROM), random access memory (RAM), a magnetic disk, or an optical disk, etc. A computer program is stored thereon, and the computer program is loaded by a processor to execute the steps in any of the information processing methods provided in embodiments of this application. For example, the computer program loaded by the processor can execute the following steps:

[0184] Obtain the unprocessed feature information of the information to be processed;

[0185] Feature extraction is performed on the feature information to be processed to obtain the target feature information;

[0186] Based on the target feature information, determine the target processing information of the information to be processed.

[0187] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the detailed descriptions of other embodiments above, which will not be repeated here.

[0188] In practice, each of the above units or structures can be implemented as an independent entity or can be arbitrarily combined to be implemented as the same or several entities. For the specific implementation of each of the above units or structures, please refer to the previous method embodiments, which will not be repeated here.

[0189] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.

[0190] The above provides a detailed description of an information processing method, system, computer device, and storage medium provided in the embodiments of this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A method, characterized in that, include: Obtain the unprocessed feature information of the information to be processed; Feature extraction is performed on the feature information to be processed to obtain target feature information; Based on the target feature information, the target processing information of the information to be processed is determined.

2. The method as described in claim 1, characterized in that, The feature information to be processed includes multimodal information; The step of extracting features from the feature information to be processed to obtain target feature information includes: Feature extraction is performed on the multimodal information to obtain the corresponding first feature information; Based on the first feature information, the target feature information is obtained.

3. The method as described in claim 2, characterized in that, The step of extracting features from the multimodal information to obtain corresponding first feature information includes: The multimodal information is subjected to first feature extraction to obtain a first feature representation corresponding to the multimodal information; The first feature representation is subjected to second feature extraction to obtain the first feature information.

4. The method as described in claim 3, characterized in that, The multimodal information includes at least one of text information, image information, or audio information; The step of extracting a first feature from the multimodal information to obtain a first feature representation corresponding to the multimodal information includes: The text information is preprocessed to obtain first preprocessed information corresponding to the text information, and the first preprocessed information is input into a first processing model to obtain a first feature representation corresponding to the text information; and / or The image information is preprocessed to obtain second preprocessed information corresponding to the image information, and the second preprocessed information is input into the first processing model to obtain a first feature representation corresponding to the image information; and / or The audio information is preprocessed to obtain third preprocessed information corresponding to the audio information, and the third preprocessed information is input into the first processing model to obtain a first feature representation corresponding to the audio information.

5. The method as described in claim 3, characterized in that, The step of extracting the second feature from the first feature representation to obtain the first feature information includes: Based on the first feature representation, a plurality of corresponding second feature information are determined, wherein each second feature information includes at least one first feature value; The corresponding first feature information is determined based on the first feature value in each of the second feature information.

6. The method as described in claim 5, characterized in that, The step of determining the corresponding first feature information based on the first feature value in each of the second feature information includes: The first feature value in each of the second feature information is grouped to obtain multiple third feature information; the first feature values ​​in each of the third feature information are sequentially adjacent, and the difference between any two first feature values ​​in each of the third feature information is less than a first threshold. The first feature value in each of the third feature information is fused to obtain the corresponding first feature information.

7. The method as described in claim 3, characterized in that, The first feature information includes at least one first feature information corresponding to the multimodal information, and each first feature information includes multiple first feature values ​​corresponding to multiple data stream positions; The process of obtaining the target feature information based on the first feature information includes: Determine the similarity between the first feature representations corresponding to the multimodal information; If the similarity is lower than a second threshold, the maximum value among at least one first feature value corresponding to each data stream location is determined as the target feature value corresponding to each data stream location, and / or When the similarity is greater than or equal to the second threshold, the at least one first feature value corresponding to each data stream position is calculated to obtain the target feature value corresponding to each data stream position; and The target feature information is determined based on the target feature value corresponding to each data stream location.

8. The method as described in claim 1, characterized in that, in, The target feature information includes at least one feature information, and determining the target processing information of the information to be processed based on the target feature information includes: Obtain benchmark feature information, which is used to characterize the expected value of the target feature value in the target feature information; The target feature information and the reference feature information in the target feature information are calculated and processed to obtain the target processing information of the information to be processed.

9. The method as described in claim 8, characterized in that, The baseline feature information is determined in the following manner: Obtain the fourth feature information corresponding to the first reference data, wherein the fourth feature information includes at least one second feature value, the first reference data is historical data processed using the reference feature value, and the reference feature value is a feature value determined based on user behavior; The second feature value in the fourth feature information is calculated and processed to obtain the third feature value; The third feature value and the reference feature value are calculated and processed to obtain the baseline feature information.

10. The method as described in claim 8, characterized in that, The baseline feature information is determined in the following manner: Obtain the fifth feature information corresponding to the second reference data and the first processing information corresponding to the second reference data, wherein the second reference data is historical data processed using the first processing information, and the first processing information is processing information determined based on user behavior; The fifth feature information and the first processing information are analyzed and processed to obtain the baseline feature information.

11. The method as described in claim 10, characterized in that, The fifth feature information includes at least one fourth feature value and a first time value corresponding to each fourth feature value; the first control information includes at least one control parameter and a second time value corresponding to each control parameter. The step of analyzing and processing the fifth feature information and the first processing information to obtain the baseline feature information includes: The fifth feature value is obtained by calculating and processing at least one of the fourth feature values ​​and the first time value corresponding to each of the fourth feature values. A sixth feature value is obtained by calculating and processing at least one of the control parameters and the second time value corresponding to each of the control parameters; The fifth and sixth feature values ​​are calculated and processed to obtain the baseline feature information.

12. A system, characterized in that, The system includes: The data acquisition module is used to acquire the feature information to be processed from the information to be processed. The feature extraction module is used to extract features from the feature information to be processed to obtain target feature information; An information determination module is used to determine the target processing information of the information to be processed based on the target feature information; Furthermore, the feature information to be processed includes multimodal information; the feature extraction module performs feature extraction on the feature information to be processed to obtain target feature information, including: Feature extraction is performed on the multimodal information to obtain the corresponding first feature information; Based on the first feature information, the target feature information is obtained; Furthermore, the feature extraction module performs feature extraction on the multimodal information to obtain corresponding first feature information, including: The multimodal information is subjected to first feature extraction to obtain a first feature representation corresponding to the multimodal information; The first feature representation is subjected to second feature extraction to obtain the first feature information; Furthermore, the multimodal information includes at least one of text information, image information, or audio information; The feature extraction module performs a first feature extraction on the multimodal information to obtain a first feature representation corresponding to the multimodal information, including: The text information is preprocessed to obtain first preprocessed information corresponding to the text information, and the first preprocessed information is input into a first processing model to obtain a first feature representation corresponding to the text information; and / or The image information is preprocessed to obtain second preprocessed information corresponding to the image information, and the second preprocessed information is input into the first processing model to obtain a first feature representation corresponding to the image information; and / or The audio information is preprocessed to obtain third preprocessed information corresponding to the audio information, and the third preprocessed information is input into the first processing model to obtain a first feature representation corresponding to the audio information; Further, the feature extraction module performs second feature extraction on the first feature representation to obtain the first feature information, including: Based on the first feature representation, a plurality of corresponding second feature information are determined, wherein each second feature information includes at least one first feature value; Based on the first feature value in each of the second feature information, determine the corresponding first feature information; Further, the feature extraction module determines the corresponding first feature information based on the first feature value in each of the second feature information, including: The first feature value in each of the second feature information is grouped to obtain multiple third feature information; the first feature values ​​in each of the third feature information are sequentially adjacent, and the difference between any two first feature values ​​in each of the third feature information is less than a first threshold. The first feature value in each of the third feature information is fused to obtain the corresponding first feature information; Furthermore, the first feature information includes at least one first feature information corresponding to the multimodal information, and each first feature information includes multiple first feature values ​​corresponding to multiple data stream positions; The feature extraction module obtains the target feature information based on the first feature information, including: Determine the similarity between the first feature representations corresponding to the multimodal information; If the similarity is lower than a second threshold, the maximum value among at least one first feature value corresponding to each data stream location is determined as the target feature value corresponding to each data stream location, and / or When the similarity is greater than or equal to the second threshold, the at least one feature value corresponding to each data stream location is calculated to obtain the target feature value corresponding to each data stream location; and The target feature information is determined based on the target feature value corresponding to each data stream position; Further, the target feature information includes at least one feature information, and the information determination module determines the target processing information of the information to be processed based on the target feature information, including: Obtain benchmark feature information, which is used to characterize the expected value of the target feature value in the target feature information; The target feature value in the target feature information and the reference feature information are calculated and processed to obtain the target processing information of the information to be processed. Furthermore, the reference feature information is determined in the following manner: Obtain the fourth feature information corresponding to the first reference data, wherein the fourth feature information includes at least one second feature value, the first reference data is historical data processed using the reference feature value, and the reference feature value is a feature value determined based on user behavior; The second feature value in the fourth feature information is calculated and processed to obtain the third feature value; The reference feature information is obtained by calculating and processing the third feature value and the reference feature value. Furthermore, the reference feature information is determined in the following manner: Obtain the fifth feature information corresponding to the second reference data and the first processing information corresponding to the second reference data, wherein the second reference data is historical data processed using the first processing information, and the first processing information is processing information determined based on user behavior; The fifth feature information and the first processing information are analyzed and processed to obtain the baseline feature information; Furthermore, the fifth feature information includes at least one fourth feature value and a first time value corresponding to each fourth feature value, and the first control information includes at least one control parameter and a second time value corresponding to each control parameter; The information determination module analyzes and processes the fifth feature information and the first processing information to obtain the baseline feature information, including: The fifth feature value is obtained by calculating and processing at least one of the fourth feature values ​​and the first time value corresponding to each of the fourth feature values. A sixth feature value is obtained by calculating and processing at least one of the control parameters and the second time value corresponding to each of the control parameters; The fifth and sixth feature values ​​are calculated and processed to obtain the baseline feature information.

13. A device, characterized in that, The device includes: One or more processors; Memory; and One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1 to 11.

14. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the operations of the method as described in any one of claims 1 to 11.