system

The system addresses the challenge of superficial data in regional strategies by analyzing image and audio data, performing correlation analysis, and providing real-time visualization to formulate effective regional revitalization strategies.

JP2026103620APending Publication Date: 2026-06-24SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-12
Publication Date
2026-06-24

Smart Images

  • Figure 2026103620000001_ABST
    Figure 2026103620000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means of analyzing image data collected from observation equipment to count the number of specific objects within a region, A method for converting audio data obtained from listening into text data, and for classifying that text data based on emotion, A means of performing correlation analysis across multiple datasets to identify relationships between specific factors, A means of visualizing data and displaying results in real time, A means of providing local information in real time based on the user's location information, A means of proposing regional revitalization strategies to users, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Conventional regional activation strategies rely on statistical data published by governments and local governments, and often cannot accurately reflect the actual situation of the region. Therefore, it is impossible to grasp the deep needs of the region that cannot be fully captured from superficial data, and as a result, it is difficult to formulate effective regional strategies. Therefore, a more realistic and detailed regional analysis method is needed.

Means for Solving the Problems

[0005] This invention provides a system that includes means for analyzing image data obtained from observation devices to count the number of specific objects in a region, means for converting audio data obtained from listening into text data and classifying it based on emotion, and means for performing correlation analysis between datasets to identify the relationships between factors, thereby enabling a grasp of true regional trends. Furthermore, it supports accurate strategic planning by providing means for real-time data visualization and display of results, and means for proposing regional revitalization strategies. In this way, it becomes possible to derive effective strategies based on the actual situation of the region.

[0006] An "observation device" is a device used to photograph or record specific objects or situations within a region.

[0007] "Image data" refers to digital data of visual information acquired using cameras or observation devices.

[0008] "Means of counting" refers to techniques or devices for analyzing image data and counting the number of specific objects or elements contained within it.

[0009] "Interviewing" is the process of obtaining opinions and information from subjects in audio format and using that information for analysis.

[0010] "Audio data" refers to audio information obtained through interviews, listening, or other means in digital format.

[0011] "Text data" refers to digital data obtained by converting audio data into written text.

[0012] "Methods of classification based on emotion" refer to techniques or algorithms for analyzing text data and classifying its content as positive, negative, or neutral.

[0013] Correlation analysis is an analytical method that statistically examines the relationships between multiple datasets and shows the strength of their mutual influence.

[0014] "Real-time data visualization" refers to the process of immediately visualizing and presenting collected or analyzed data in an understandable manner.

[0015] "Regional activation strategy" refers to specific plans and policies implemented to improve the economy, culture, living environment, etc. of the local community.

Brief Explanation of Drawings

[0016] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map where multiple emotions are mapped. [Figure 10] It shows an emotion map where multiple emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.

Mode for Carrying Out the Invention

[0017] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0018] First, the terms used in the following description will be explained.

[0019] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), etc.

[0020] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0021] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0022] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0023] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0024] [First Embodiment]

[0025] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0026] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0027] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0028] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0029] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0030] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0031] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0032] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0033] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0034] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0035] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0036] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0037] This invention relates to a system that acquires observational and interview data from various data sources, analyzes it to evaluate the local situation, and provides effective revitalization strategies.

[0038] This system is centered around a server that collects data from multiple observation devices. These observation devices include cameras installed in specific locations, such as train station plazas or residential areas, which periodically photograph traffic conditions and record them as image data. Opinions gathered by users from taxi drivers and others are also sent to the server. The server receives this audio data and automatically converts it into text information.

[0039] Next, the server analyzes the acquired image data to identify specific objects and elements. For example, it counts the number of people and bicycles from an image of the area in front of a train station to understand the flow of people during peak hours. Image recognition technology is used for this purpose. The server also uses natural language processing technology to analyze the sentiment of text data converted from speech, classifying it as positive, negative, or neutral. This makes it possible to accurately understand and analyze various voices within the community.

[0040] Furthermore, the server performs correlation analysis between different datasets. For example, it identifies the relationship between data on the use of commercial facilities around train stations and data on the transportation use of local residents. Such analysis results play a crucial role in understanding regional characteristics and formulating effective strategies.

[0041] The analysis results are immediately visualized and provided to the user via their device. The real-time generated dashboard allows users to intuitively understand the local situation. For example, they can view graphs showing weekend traffic fluctuations and pedestrian flow.

[0042] As a concrete example, suppose an analysis of pedestrian traffic in front of a train station in a regional city reveals a significant difference between weekdays and weekends. In this case, the user (local government official) can then formulate a policy to strengthen weekend transportation infrastructure. This makes it possible to develop investment plans and public policies that accurately reflect the local situation.

[0043] Thus, the system of the present invention can grasp the characteristics of a region in detail through multifaceted analysis of collected data and propose appropriate revitalization strategies.

[0044] The following describes the processing flow.

[0045] Step 1:

[0046] The server collects image data in real time from observation devices. Image data is acquired by cameras installed in specific areas such as in front of train stations and residential areas, and then transferred to the server.

[0047] Step 2:

[0048] The server receives audio data from users who have performed the listening test. This audio data is collected from people such as taxi drivers and restaurant owners, and the server performs a speech recognition process to convert it into text data.

[0049] Step 3:

[0050] The server applies image recognition technology to the collected image data to detect and count specific objects (e.g., people, cars, bicycles, etc.). This allows for the quantification of pedestrian flow and traffic volume in specific areas.

[0051] Step 4:

[0052] The server analyzes the text data converted by speech recognition using natural language processing (NLP) and classifies the sentiment. The text is classified as positive, negative, or neutral, and this data is used to capture the sentiments of local residents.

[0053] Step 5:

[0054] The server performs correlation analysis between different datasets. It analyzes interactions within a region, such as showing the relationship between local transportation usage data and commercial facility usage data.

[0055] Step 6:

[0056] The server generates the analysis results as a visual dashboard and sends it to the terminal. The terminal provides a user-friendly interface, allowing the results to be viewed in real time.

[0057] Step 7:

[0058] Users can review the dashboard provided on their device and develop strategies based on local conditions and trends. For example, they can consider ways to improve transportation infrastructure during specific time periods.

[0059] (Example 1)

[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0061] In modern local communities, it is crucial to understand the situation from multiple perspectives and formulate appropriate measures in order to revitalize the region. However, traditional methods have presented challenges in efficiently and comprehensively analyzing the vast amount of information obtained from multiple data sources and providing that information to users intuitively and immediately.

[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0063] In this invention, the server includes means for analyzing image information collected from observation devices to measure the number of specific objects within a region, means for converting acquired audio information into text information and classifying that text information based on emotion, and means for performing correlation analysis between multiple information sets to identify relationships between specific elements. This makes it possible to analyze the situation in a region from multiple perspectives, visualize it immediately, and propose revitalization strategies.

[0064] An "observation device" is a device installed in a specific location to monitor the situation within that area, and is used to collect image information and other data.

[0065] "Image information" refers to visual information collected by observation devices that shows objects and conditions within a region.

[0066] "Audio information" refers to data collected from users and local residents in the form of spoken language.

[0067] "Textual information" refers to data in text format that is obtained by converting audio information using speech recognition technology.

[0068] "Classification based on emotion" is the process of analyzing textual information and determining whether its content is positive, negative, or neutral.

[0069] Correlation analysis is the process of analyzing the relationships and patterns between elements in multiple sets of information using statistical methods.

[0070] "Visualization" is a method of displaying data and analysis results in a format that can be intuitively understood, using images, graphs, charts, and other visual aids.

[0071] A "prompt" is text used to give specific instructions or questions to a generative AI model.

[0072] A "generative AI model" is a system that uses artificial intelligence technology to generate information and suggestions in a human-understandable format based on a given prompt.

[0073] This invention is a system for comprehensively understanding the situation in a local community and formulating more effective revitalization strategies. The system mainly consists of a server, observation devices, and user terminals.

[0074] The server analyzes image information collected from observation devices. These observation devices are installed in specific areas and include cameras that capture various situations within the area. The accumulated image information is then analyzed on the server using image recognition technologies (such as OpenCV and TENSORFLOW®) to measure specific objects or elements, such as the number of people or bicycles.

[0075] Simultaneously, the server converts the audio information obtained from the user into text information. The audio data is converted into text format using speech recognition software (such as Google® Cloud Speech-to-Text). This text information is then subjected to sentiment analysis using natural language processing technology (such as NLTK or SpaCy) and classified into three categories: positive, negative, and neutral.

[0076] Furthermore, the server performs correlation analysis using image information, text information, and other collected data. To clarify the relationships between the data, it uses the Python Pandas library to calculate the correlation coefficient of the numerical data.

[0077] The analysis results are visualized in a dashboard format using visualization libraries such as Plotly and Matplotlib. The terminal provides this information to the user in real time, offering a means to intuitively understand the current situation and trends in the region.

[0078] As a concrete example, consider an analysis of pedestrian traffic in front of train stations in regional cities, comparing weekday and weekend traffic trends. Based on this information, local government officials (the users) can formulate improvement measures for weekend transportation infrastructure.

[0079] To make even better use of this system, users input prompts into the generated AI model. An example prompt might be, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends." Based on these prompts, the AI ​​generates optimal suggestions based on the data.

[0080] Therefore, this system is extremely effective in gaining a deeper understanding of regional characteristics and formulating concrete and feasible revitalization policies.

[0081] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0082] Step 1:

[0083] The server receives image information from the observation device. A camera is used as the observation device, and image data captured by the camera within the area is input. The server acquires this image data and identifies objects using image recognition technology (such as OpenCV or TensorFlow). Specifically, it measures the number of people and bicycles in the image and outputs the results as numerical data. This data is used as basic information to understand the traffic volume and pedestrian flow in the area.

[0084] Step 2:

[0085] Users provide audio information they have gathered from local residents and taxi drivers to a server. The server receives this audio data as input and converts it into text using speech recognition software (such as Google Cloud Speech-to-Text). The converted text is then subjected to natural language processing technologies (such as NLTK and SpaCy) to analyze the sentiment contained within it. This allows the text to be classified as positive, negative, or neutral, providing a means to gain a concrete understanding of the voices of local residents.

[0086] Step 3:

[0087] The server takes numerical and text data obtained from images as input and performs correlation analysis between multiple datasets. It utilizes the Python Pandas library to calculate correlation coefficients between these data points. For example, it can output how much the utilization rate of commercial facilities around a train station affects traffic volume during a specific time period. This correlation analysis provides a concrete understanding of how regional economic activity and traffic trends are related.

[0088] Step 4:

[0089] The server performs the steps to visualize the analysis results. Using visualization libraries such as Plotly and Matplotlib, it generates output in a dashboard format. This dashboard is provided to the user via their terminal, displaying the analysis results in graphs and charts. The user can use this to perform a detailed analysis of the current situation and understand trends.

[0090] Step 5:

[0091] The user inputs prompts into the generating AI model. For example, a prompt such as, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends," might be used. Based on these prompts, the AI ​​generates data-driven proposals and outputs them. This provides concrete insights that can contribute to the formulation of regional revitalization strategies.

[0092] (Application Example 1)

[0093] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0094] Currently, there is a lack of efficient means to collect and analyze local data and provide citizens with the information necessary for regional revitalization in real time. As a result, it takes time to understand the local situation, which can delay the planning and implementation of effective revitalization strategies. Furthermore, there is a challenge in that it is difficult to integrate information from diverse data sources, and useful information is not being appropriately provided to users.

[0095] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0096] In this invention, the server includes means for analyzing image data collected from observation devices to count the number of specific objects in a region, means for converting audio data obtained from listening into text data and classifying the text data based on emotion, means for performing correlation analysis between multiple datasets to identify relationships between specific factors, and means for providing regional information in real time based on the user's location information. This enables a rapid and comprehensive understanding of the regional situation, allowing for timely information provision to users and proposals for regional revitalization strategies.

[0097] An "observation device" is a device installed at a specific location within a region to collect image data.

[0098] "Image data analysis" refers to a method of processing image data obtained from observation equipment to count the number of specific objects or people.

[0099] "Audio data conversion" is the process of automatically converting audio data obtained from listening into text data.

[0100] "Sentiment analysis" is a technique that analyzes text data and classifies its content based on emotions such as positive, negative, and neutral.

[0101] Correlation analysis is a technique that identifies relationships between multiple datasets and reveals correlations between specific factors.

[0102] "Real-time visualization" is a technology that instantly displays acquired data visually, allowing users to intuitively understand the situation.

[0103] A "regional revitalization strategy" is a set of specific proposals and plans based on acquired data to revitalize the local economy and living environment.

[0104] "Location-based local information provision" is a function that provides the latest information related to the surrounding area in real time, based on the user's current location.

[0105] The system implementing this invention performs data collection, analysis, visualization, and information provision. The server analyzes image data acquired from observation devices and performs calculations to identify the number of specific objects within a region. This uses image recognition software such as OpenCV and TensorFlow. The server also converts voice data obtained from taxi drivers and others into text data using the Google Cloud Natural Language API and performs sentiment analysis. This analysis classifies the data as positive, negative, or neutral.

[0106] The collected data is stored in MongoDB, and correlation and time-series analyses are performed using Python scripts. This analysis, based on data from observation devices and user location information, is particularly useful for predicting pedestrian flow and congestion levels within a region. Furthermore, these results are visualized in real-time on a smartphone application using React Native.

[0107] Users with a device can receive location-based information on nearby traffic conditions and events. For example, peak times for pedestrian traffic in a region on weekends can be predicted, and notifications can be sent to help users choose appropriate modes of transportation. This allows users to avoid congestion in their area and travel efficiently.

[0108] As a concrete example, a prompt message for a generative AI model might be in the format of, "Please provide a summary of the traffic situation in front of XX Station yesterday. What time of day was the most congested?" Such prompt messages allow users to obtain specific analytical results and predictive information based on past data.

[0109] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0110] Step 1:

[0111] The server acquires image data from observation equipment. This includes video from cameras covering a specific area within the region. The server receives this image data as input and uses OpenCV and TensorFlow to identify and count objects within the image. The output is data indicating the number of specific objects.

[0112] Step 2:

[0113] The server collects audio data from the source of the listening. Conversations between users, such as with a taxi driver, are sent to the server as audio data. This audio data is converted to text using the Google Cloud Natural Language API. The converted text data is then used as input for sentiment analysis, and the results are output, categorized as positive, negative, or neutral.

[0114] Step 3:

[0115] The server aggregates historical and current data stored in MongoDB and performs correlation analysis and time series analysis using Python. The input for this analysis is numerical data obtained from image data and sentiment data obtained from text data. The analysis results are used to evaluate the movement of people and the impact of events within a region, and regularities and predictions are output.

[0116] Step 4:

[0117] The smartphone, acting as the terminal, receives real-time analysis results from the server. The user's location information is sent to the server as input, and regional information and congestion predictions are displayed on the user's terminal. The visualized results are provided to the user using an application built with React Native.

[0118] Step 5:

[0119] The user inputs a prompt message into the AI ​​model generated within the app. For example, suppose the user sends the text, "Please tell me about the current traffic conditions." The server receives this prompt message, generates an answer based on past data and analysis results, and outputs it to the user.

[0120] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0121] This invention relates to a system that combines data collection from observation devices, text conversion of audio data, correlation analysis between multiple data sets, data visualization, and strategy proposal with an emotion engine that recognizes user emotions.

[0122] This system is server-centric and collects image data from various locations in the region through observation devices. Users provide the system with audio data they have heard from people such as taxi drivers and restaurant owners, and the server is responsible for converting that audio data into text.

[0123] The server uses image recognition technology to identify specific objects from image data and count their number. The identified data is then used as foundational data for specific calculations of local pedestrian and traffic flow and other related data.

[0124] In addition, the server utilizes an emotion engine to analyze the user's emotions from the converted text data. The emotion engine uses natural language processing techniques to analyze keywords and context within the text and recognize emotional states such as positive, negative, and neutral.

[0125] The server also uses this data to perform correlation analysis between regional datasets. This reveals how specific regional factors influence user sentiment and regional revitalization. The results are displayed visually on the device. Users can use the provided visual dashboard to intuitively understand the situation in their region.

[0126] Furthermore, the system uses time-series analysis to predict future regional trends. The information generated in real time serves as an important tool for users when considering future strategies.

[0127] As a concrete example, suppose a user uses a server to analyze the emotions of young people in a specific region and detects that positive emotions increase on weekends. Based on this, the server can suggest increasing the number of entertainment and cultural events in that region. In this way, the present invention provides a practical system that integrates regional data with user emotion information to support effective regional revitalization strategies.

[0128] The following describes the processing flow.

[0129] Step 1:

[0130] The server collects image data from multiple observation devices within a region. These devices are installed at specific locations and automatically capture images according to a pre-set schedule, then transfer them to the server.

[0131] Step 2:

[0132] Users conduct interviews with taxi drivers and local restaurant owners and send the audio recordings to the server. The server then uses speech recognition technology to convert the received audio data into text.

[0133] Step 3:

[0134] The server applies image recognition algorithms to the image data to automatically count the number of specific objects (e.g., people, cars, bicycles). The results are stored as digital records and used as indicators of activity in the area.

[0135] Step 4:

[0136] The server uses a sentiment engine to analyze text data. Natural language processing techniques are used to analyze keywords and context, classifying the user's sentiment as positive, negative, or neutral.

[0137] Step 5:

[0138] The server performs correlation analysis between the collected datasets. This identifies the relationship between sentiment data and local activities and factors. This analysis is important for extracting specific factors that influence users' sentiment in a given region.

[0139] Step 6:

[0140] The server visualizes the analysis results and displays them on the terminal in real time. The user's terminal can then use the generated dashboard to visually check the local situation and sentiment trends.

[0141] Step 7:

[0142] Based on information obtained from their devices, users consider and implement measures that are appropriate to the characteristics of the region and the emotional state of the users. For example, in a region where positive emotions are prevalent on weekends, they might suggest holding events or gatherings.

[0143] (Example 2)

[0144] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0145] In modern cities and regions, the increasing population density and complex transportation systems make it difficult to formulate appropriate regional revitalization strategies. Furthermore, the scattered nature of various data points leads to a lack of ability to integrate and effectively analyze this data to develop concrete proposals based on regional characteristics and trends. Currently, there is a demand for optimal strategic proposals that take into account temporal trends and people's emotions, but achieving this in real time is challenging.

[0146] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0147] In this invention, the server includes means for analyzing video information collected from observation equipment and counting the number of objects within a predetermined area; means for converting audio information into text information and classifying that text information based on emotional states; and means for analyzing the correlation between multiple sets of information and understanding the relationships between specific factors. This enables the integrated analysis of regional data and the formulation of concrete and effective regional revitalization strategies through real-time emotional analysis and information visualization.

[0148] "Observation equipment" refers to devices used to collect images and videos from various locations within a region.

[0149] "Visual information" refers to visual data captured by observation equipment, and this includes photographs and videos.

[0150] "Counting the quantity of objects" is the process of analyzing video information to identify the number of specific objects or people.

[0151] "Audio information" refers to data obtained from sound, including human conversations and ambient sounds.

[0152] "Converting to text information" is the process of converting audio information into text format.

[0153] "Classifying based on emotional state" is the process of analyzing textual information to identify whether its content corresponds to a positive, negative, or neutral emotion.

[0154] An "information set" refers to a diverse collection of data, which includes both visual and textual information.

[0155] "Analyzing correlations" is the process of analyzing the relationships between sets of information and clarifying the relationships between specific factors.

[0156] A "regional revitalization strategy" refers to specific plans and proposals aimed at regional development and improving the lives of residents.

[0157] In the embodiment of the invention, a server-centered system is used. This system enables detailed data collection, analysis, visualization, and strategic proposals.

[0158] The server first collects video information from multiple locations in the region using observation equipment. This includes high-resolution cameras and other sensors. The collected video information is processed by image recognition algorithms (e.g., TensorFlow and OpenCV), known as analysis software, to count specific objects. The data obtained from this analysis is used for urban traffic volume analysis and understanding pedestrian flow.

[0159] To further enhance local information, users input audio data acquired from the site into the system. This audio data is converted into text by a server using speech recognition software (e.g., a speech recognition API). This text data is then processed through a sentiment analysis engine and classified based on emotional states. This process utilizes software employing natural language processing techniques (e.g., NLTK or Transformers).

[0160] Furthermore, the server integrates and analyzes these different datasets. Statistical software is used, particularly to analyze correlations and uncover the relationships between specific factors. The results of this analysis are provided through immediate data visualization on the terminal, displayed using dedicated dashboards (e.g., Tableau or Power BI). This makes it easier for users to understand regional trends in real time.

[0161] In future strategic proposals, time-series data analysis models will be used to predict regional changes. This will allow users to obtain foundational information for making strategic decisions in areas such as transportation, urban development, and social activities.

[0162] As a concrete example, when a user uses this system to analyze pedestrian traffic in a commercial district on a holiday, the system can identify congestion on specific streets on weekends based on data trends. As a suggestion, the server can recommend holding pedestrian-priority events in this area. An example of a prompt generated by the AI ​​model is, "Based on past data, identify the main causes of negative sentiment in the area and suggest ways to improve it." In this way, the present invention provides a practical method for revitalizing local communities through complex data analysis.

[0163] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0164] Step 1:

[0165] The server receives video information collected from observation equipment. This video information is collected from various locations in the region. The server receives this information as input, uses image recognition technology to identify specific objects, and counts their quantities. Specifically, the server uses TensorFlow to detect objects such as people and cars, and outputs the results to a database.

[0166] Step 2:

[0167] The user acquires audio information and uploads it to the system. The server receives this audio information as input and converts it into text using speech recognition software. The resulting text information is then used as specific data for sentiment analysis, and the server prepares this as output.

[0168] Step 3:

[0169] The server uses the acquired text information as input to activate its sentiment analysis engine. This engine uses natural language processing techniques to classify emotional states (positive, negative, neutral) from the text information. Specifically, the server extracts emotional keywords, determines the emotional category based on these keywords, and outputs the result.

[0170] Step 4:

[0171] The server integrates all datasets and analyzes their correlations. The input consists of multiple datasets, and the output shows the relationships between specific factors. The server uses statistical methods to analyze this and reveal specific patterns and trends.

[0172] Step 5:

[0173] The server uses the analysis results to generate a dataset for visualization on the terminal and outputs it to a visualization tool. Specifically, the server provides data to Tableau or Power BI, and displays the results visually using graphs and charts.

[0174] Step 6:

[0175] The server performs time series analysis to predict future regional trends. This uses historical data as input and generates predictions of future trends as output. Specifically, the server applies a time series model to create data showing, for example, the expected increase or decrease in traffic volume in the following month.

[0176] Step 7:

[0177] The server uses the generated information to construct specific strategies in order to propose regional development plans to the user. In this process, the generating AI model outputs strategic proposals based on prompts. An example of a prompt is, "Predict the expected emotional trends of people at the next event and propose the optimal marketing strategy."

[0178] (Application Example 2)

[0179] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0180] In modern urban environments, it is essential to understand residents' sentiments and local trends in real time and develop effective strategies based on that understanding. However, traditional methods lacked systems capable of instantly analyzing vast amounts of data and presenting the results visually, making rapid decision-making difficult. In particular, proposals for regional revitalization were often not intuitive, requiring users to understand their importance and take action.

[0181] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0182] In this invention, the server includes means for analyzing image information collected from observation equipment to count the number of specific objects within a region, means for converting audio information obtained from listening into text information and classifying that text information based on emotion, and means for using a computer system that provides an intuitive interface for users to understand proposed policies and take new actions. This enables real-time understanding of the local situation and quick, clear decision-making.

[0183] "Observation equipment" refers to devices that collect image information from various locations in a region.

[0184] "Analysis" is the process of examining collected information and extracting specific patterns or meaningful data.

[0185] "Object" refers to an object or event that is specifically identified as the subject of observation.

[0186] "Audio information" refers to data collected from the voices of one or more people.

[0187] "Textual information" refers to data that represents audio information as text.

[0188] "Emotion-based classification" is a process that uses natural language processing techniques to analyze text data and categorize it into emotional states such as positive, negative, and neutral.

[0189] An "interface" refers to the display screen or input method that a user uses to interact with a computer system.

[0190] A "computer system" is a system consisting of a series of devices and software that process and analyze data.

[0191] "Immediate" refers to a situation where there is virtually no time required from information collection to display.

[0192] "Visualization" refers to displaying data as graphs or diagrams to make information easier to understand intuitively.

[0193] A "regional revitalization policy" is a plan or strategy that outlines how to make a specific region more vibrant.

[0194] "Users" refers to those who operate this system and receive data and suggestion information.

[0195] The server is configured as follows to implement this invention. First, multiple cameras and sensors are installed in the area as observation equipment to collect image data in real time. This data is analyzed using the open-source image processing library OpenCV to detect specific objects and count their numbers. The results of this image analysis are used to understand the flow of people and traffic volume within the area.

[0196] Simultaneously, the audio information collected from the user is converted into text using a cloud-based speech recognition API. The converted text is then subjected to sentiment analysis using a natural language processing library (such as NLTK). This sentiment analysis determines whether the user's statements are positive, negative, or neutral.

[0197] The server performs correlation analysis between these multiple datasets to identify specific causal relationships. The results of this analysis are visualized on the computer system in a way that is intuitively understandable to users and displayed on the user's terminal in real time. Based on this information, users can consider regional revitalization strategies. Users are provided with an intuitive interface and assistance in easily formulating new action plans.

[0198] As a concrete example, this system could be used during a music event in a certain city to detect positive emotional tendencies among a large number of residents, and this information could be used to propose strategies for further improving the event the following year.

[0199] An example of a prompt using a generative AI model is: "Analyze the impact of local music events on residents' emotions and generate specific strategic proposals."

[0200] In this way, the system enables efficient analysis of complex data and provides users with useful information for regional revitalization.

[0201] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0202] Step 1:

[0203] The server collects regional image data in real time from observation equipment. The input is raw image data from cameras and sensors, and the output is image data converted into an analyzable format. To achieve this, image resolution and format conversion are performed in the initial stages.

[0204] Step 2:

[0205] The server uses OpenCV to analyze the collected image data, detect specific objects, and count their numbers. The input is the transformed image data obtained in step 1, and the output is data about the type and number of objects. Specifically, it applies an object detection algorithm and updates the database with the recognized objects.

[0206] Step 3:

[0207] The server converts the voice information provided by the user into text using a speech recognition API. The input is voice data, and the output is the converted string data. Specifically, the speech recognition API analyzes the frequency components of the voice and generates the corresponding string.

[0208] Step 4:

[0209] The server performs sentiment analysis on text data using natural language processing techniques. The input is the text data generated in step 3, and the output is an emotional state such as positive, negative, or neutral. In this step, keywords within the text are analyzed and an emotional score is calculated.

[0210] Step 5:

[0211] The server investigates the correlation between image analysis data and sentiment data from text, and identifies the relationships. The input is the dataset obtained in steps 2 and 4, and the output is the analysis results showing causal relationships. Specifically, a statistical model is used to calculate the correlation between the data.

[0212] Step 6:

[0213] The terminal visualizes the analysis results sent from the server and displays them to the user. The input is the analysis data obtained in step 5, and the output is a graphical display on the user interface. In this step, graphs and dashboards are generated based on the data and presented in a way that the user can intuitively understand.

[0214] Step 7:

[0215] Users formulate and implement regional revitalization policies through the provided interface. The input is visualized analysis results, and the output is a new action plan. In the final step, the system supports users in navigating the interface, formulating the new plan, and putting it into practice.

[0216] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0217] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search)<url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0218] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0219] [Second Embodiment]

[0220] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0221] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0222] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0223] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0224] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0225] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0226] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0227] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0228] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0229] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0230] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0231] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0232] This invention relates to a system that acquires observational and interview data from various data sources, analyzes it to evaluate the local situation, and provides effective revitalization strategies.

[0233] This system is centered around a server that collects data from multiple observation devices. These observation devices include cameras installed in specific locations, such as train station plazas or residential areas, which periodically photograph traffic conditions and record them as image data. Opinions gathered by users from taxi drivers and others are also sent to the server. The server receives this audio data and automatically converts it into text information.

[0234] Next, the server analyzes the acquired image data to identify specific objects and elements. For example, it counts the number of people and bicycles from an image of the area in front of a train station to understand the flow of people during peak hours. Image recognition technology is used for this purpose. The server also uses natural language processing technology to analyze the sentiment of text data converted from speech, classifying it as positive, negative, or neutral. This makes it possible to accurately understand and analyze various voices within the community.

[0235] Furthermore, the server performs correlation analysis between different datasets. For example, it identifies the relationship between data on the use of commercial facilities around train stations and data on the transportation use of local residents. Such analysis results play a crucial role in understanding regional characteristics and formulating effective strategies.

[0236] The analysis results are immediately visualized and provided to the user via their device. The real-time generated dashboard allows users to intuitively understand the local situation. For example, they can view graphs showing weekend traffic fluctuations and pedestrian flow.

[0237] As a concrete example, suppose an analysis of pedestrian traffic in front of a train station in a regional city reveals a significant difference between weekdays and weekends. In this case, the user (local government official) can then formulate a policy to strengthen weekend transportation infrastructure. This makes it possible to develop investment plans and public policies that accurately reflect the local situation.

[0238] Thus, the system of the present invention can grasp the characteristics of a region in detail through multifaceted analysis of collected data and propose appropriate revitalization strategies.

[0239] The following describes the processing flow.

[0240] Step 1:

[0241] The server collects image data in real time from the observation equipment. The image data is acquired by cameras installed in specific areas such as in front of train stations and residential areas, and then transferred to the server.

[0242] Step 2:

[0243] The server receives audio data from users who have performed the listening test. This audio data is collected from people such as taxi drivers and restaurant owners, and the server performs a speech recognition process to convert it into text data.

[0244] Step 3:

[0245] The server applies image recognition technology to the collected image data to detect and count specific objects (e.g., people, cars, bicycles, etc.). This allows for the quantification of pedestrian flow and traffic volume in specific areas.

[0246] Step 4:

[0247] The server analyzes the text data converted by speech recognition using natural language processing (NLP) and classifies the sentiment. The text is classified as positive, negative, or neutral, and this data is used to capture the sentiments of local residents.

[0248] Step 5:

[0249] The server performs correlation analysis between different datasets. It analyzes interactions within a region, such as showing the relationship between local transportation usage data and commercial facility usage data.

[0250] Step 6:

[0251] The server generates the analysis results as a visual dashboard and sends it to the terminal. The terminal provides a user-friendly interface, allowing the results to be viewed in real time.

[0252] Step 7:

[0253] Users can review the dashboard provided on their device and develop strategies based on local conditions and trends. For example, they can consider ways to improve transportation infrastructure during specific time periods.

[0254] (Example 1)

[0255] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0256] In modern local communities, it is crucial to understand the situation from multiple perspectives and formulate appropriate measures in order to revitalize the region. However, traditional methods have presented challenges in efficiently and comprehensively analyzing the vast amount of information obtained from multiple data sources and providing that information to users intuitively and immediately.

[0257] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0258] In this invention, the server includes means for analyzing image information collected from observation devices to measure the number of specific objects within a region, means for converting acquired audio information into text information and classifying that text information based on emotion, and means for performing correlation analysis between multiple information sets to identify relationships between specific elements. This makes it possible to analyze the situation in a region from multiple perspectives, visualize it immediately, and propose revitalization strategies.

[0259] An "observation device" is a device installed in a specific location to monitor the situation within that area, and is used to collect image information and other data.

[0260] "Image information" refers to visual information collected by observation devices that shows objects and conditions within a region.

[0261] "Audio information" refers to data collected from users and local residents in the form of spoken language.

[0262] "Textual information" refers to data in text format that is obtained by converting audio information using speech recognition technology.

[0263] "Classification based on emotion" is the process of analyzing textual information and determining whether its content is positive, negative, or neutral.

[0264] Correlation analysis is the process of analyzing the relationships and patterns between elements in multiple sets of information using statistical methods.

[0265] "Visualization" is a method of displaying data and analysis results in a format that can be intuitively understood, using images, graphs, charts, and other visual aids.

[0266] A "prompt" is text used to give specific instructions or questions to a generative AI model.

[0267] A "generative AI model" is a system that uses artificial intelligence technology to generate information and suggestions in a human-understandable format based on a given prompt.

[0268] This invention is a system for comprehensively understanding the situation in a local community and formulating more effective revitalization strategies. The system mainly consists of a server, observation devices, and user terminals.

[0269] The server analyzes image information collected from observation devices. These observation devices are installed in specific areas and include cameras that capture various scenes within the area. The accumulated image information is then analyzed on the server using image recognition technologies (such as OpenCV and TensorFlow) to measure specific objects or elements, such as the number of people or bicycles.

[0270] Simultaneously, the server converts the audio information obtained from the user into text information. The audio data is converted into text format using speech recognition software (such as Google Cloud Speech-to-Text). This text information is then subjected to sentiment analysis using natural language processing technology (such as NLTK or SpaCy) and classified into three categories: positive, negative, and neutral.

[0271] Furthermore, the server performs correlation analysis using image information, text information, and other collected data. To clarify the relationships between the data, it uses the Python Pandas library to calculate the correlation coefficient of the numerical data.

[0272] The analysis results are visualized in a dashboard format using visualization libraries such as Plotly and Matplotlib. The terminal provides this information to the user in real time, offering a means to intuitively understand the current situation and trends in the region.

[0273] As a concrete example, consider an analysis of pedestrian traffic in front of train stations in regional cities, comparing weekday and weekend traffic trends. Based on this information, local government officials (the users) can formulate improvement measures for weekend transportation infrastructure.

[0274] To make even better use of this system, users input prompts into the generated AI model. An example prompt might be, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends." Based on these prompts, the AI ​​generates optimal suggestions based on the data.

[0275] Therefore, this system is extremely effective in gaining a deeper understanding of regional characteristics and formulating concrete and feasible revitalization policies.

[0276] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0277] Step 1:

[0278] The server receives image information from the observation device. A camera is used as the observation device, and image data captured by the camera within the area is input. The server acquires this image data and identifies objects using image recognition technology (such as OpenCV or TensorFlow). Specifically, it measures the number of people and bicycles in the image and outputs the results as numerical data. This data is used as basic information to understand the traffic volume and pedestrian flow in the area.

[0279] Step 2:

[0280] Users provide audio information they have gathered from local residents and taxi drivers to a server. The server receives this audio data as input and converts it into text using speech recognition software (such as Google Cloud Speech-to-Text). The converted text is then subjected to natural language processing technologies (such as NLTK and SpaCy) to analyze the sentiment contained within it. This allows the text to be classified as positive, negative, or neutral, providing a means to gain a concrete understanding of the voices of local residents.

[0281] Step 3:

[0282] The server takes numerical and text data obtained from images as input and performs correlation analysis between multiple datasets. It utilizes the Python Pandas library to calculate correlation coefficients between these data points. For example, it can output how much the utilization rate of commercial facilities around a train station affects traffic volume during a specific time period. This correlation analysis provides a concrete understanding of how regional economic activity and traffic trends are related.

[0283] Step 4:

[0284] The server executes a procedure to visualize the analysis results. Using a visualization library such as Plotly or Matplotlib, it generates an output in the form of a dashboard. This dashboard is provided to the user through the terminal and displays the analysis results in graphs and charts. The user can utilize this to conduct a specific current situation analysis and grasp trends.

[0285] Step 5:

[0286] The user inputs a prompt sentence to the generated AI model. For example, a prompt such as "Analyze the pedestrian flow in front of the station in local cities and propose an infrastructure improvement plan considering the difference in traffic volume between weekdays and weekends" is used. Based on this prompt, the AI generates a data-based proposal and outputs its statement. Thereby, specific findings contributing to the formulation of the regional revitalization strategy can be obtained.

[0287] (Application Example 1)

[0288] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0289] Currently, there is a lack of means to efficiently accumulate and analyze regional data and provide citizens with the information necessary for regional revitalization in real time. As a result, it takes time to grasp the regional situation, and the formulation and implementation of effective revitalization strategies may be delayed. In addition, there is a problem that it is difficult to integrate information from various data sources and useful information is not appropriately provided to users.

[0290] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following respective means.

[0291] In this invention, the server includes means for analyzing image data collected from observation devices to count the number of specific objects in a region, means for converting audio data obtained from listening into text data and classifying the text data based on emotion, means for performing correlation analysis between multiple datasets to identify relationships between specific factors, and means for providing regional information in real time based on the user's location information. This enables a rapid and comprehensive understanding of the regional situation, allowing for timely information provision to users and proposals for regional revitalization strategies.

[0292] An "observation device" is a device installed at a specific location within a region to collect image data.

[0293] "Image data analysis" refers to a method of processing image data obtained from observation devices to count the number of specific objects or people.

[0294] "Audio data conversion" is the process of automatically converting audio data obtained from listening into text data.

[0295] "Sentiment analysis" is a technique that analyzes text data and classifies its content based on emotions such as positive, negative, and neutral.

[0296] Correlation analysis is a technique that identifies relationships between multiple datasets and reveals correlations between specific factors.

[0297] "Real-time visualization" is a technology that instantly displays acquired data visually, allowing users to intuitively understand the situation.

[0298] A "regional revitalization strategy" is a set of specific proposals and plans based on acquired data to revitalize the local economy and living environment.

[0299] "Location-based local information provision" is a function that provides the latest information related to the surrounding area in real time, based on the user's current location.

[0300] The system implementing this invention performs data collection, analysis, visualization, and information provision. The server analyzes image data acquired from observation devices and performs calculations to identify the number of specific objects within a region. This uses image recognition software such as OpenCV and TensorFlow. The server also converts voice data obtained from taxi drivers and others into text data using the Google Cloud Natural Language API and performs sentiment analysis. This analysis classifies the data as positive, negative, or neutral.

[0301] The collected data is stored in MongoDB, and correlation and time-series analyses are performed using Python scripts. This analysis, based on data from observation devices and user location information, is particularly useful for predicting pedestrian flow and congestion levels within a region. Furthermore, these results are visualized in real-time on a smartphone application using React Native.

[0302] Users with a device can receive location-based information on nearby traffic conditions and events. For example, peak times for pedestrian traffic in a region on weekends can be predicted, and notifications can be sent to help users choose appropriate modes of transportation. This allows users to avoid congestion in their area and travel efficiently.

[0303] As a concrete example, a prompt message for a generative AI model might be in the format of, "Please provide a summary of the traffic situation in front of XX Station yesterday. What time of day was the most congested?" Such prompt messages allow users to obtain specific analytical results and predictive information based on past data.

[0304] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0305] Step 1:

[0306] The server acquires image data from the observation device. This includes video from cameras covering specific areas within the region. The server receives this image data as input, and uses OpenCV and TensorFlow to identify and count the objects in the image. As output, data indicating the number of specific objects is obtained.

[0307] Step 2:

[0308] The server collects voice data from the source of the recording. Conversations made by the user with a taxi driver or the like are sent to the server as voice data. This voice data is converted into text using the Google Cloud Natural Language API. Using the converted text data as input, sentiment analysis is performed, and results classified as positive, negative, or neutral are output.

[0309] Step 3:

[0310] The server aggregates past and current data stored in MongoDB, and performs correlation analysis and time series analysis using Python. The inputs to this analysis are the counts obtained from the image data and the sentiment data obtained from the text data. From the analysis results, the movements of people within the region and the impact of events are evaluated, and regularities and predictions are output.

[0311] Step 4:

[0312] The smartphone, which is the terminal, receives the real-time analysis results from the server. The user's location information is sent to the server as input, and regional information and congestion predictions are displayed on the user terminal. Using an application developed with React Native, the visualized results are provided to the user.

[0313] Step 5:

[0314] The user inputs a prompt message into the AI ​​model generated within the app. For example, suppose the user sends the text, "Please tell me about the current traffic conditions." The server receives this prompt message, generates an answer based on past data and analysis results, and outputs it to the user.

[0315] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0316] This invention relates to a system that combines data collection from observation devices, text conversion of audio data, correlation analysis between multiple data sets, data visualization, and strategy proposal with an emotion engine that recognizes user emotions.

[0317] This system is server-centric and collects image data from various locations in the region through observation devices. Users provide the system with audio data they have heard from people such as taxi drivers and restaurant owners, and the server is responsible for converting that audio data into text.

[0318] The server uses image recognition technology to identify specific objects from image data and count their number. The identified data is then used as foundational data for specific calculations of local pedestrian and traffic flow and other related data.

[0319] In addition, the server utilizes an emotion engine to analyze the user's emotions from the converted text data. The emotion engine uses natural language processing techniques to analyze keywords and context within the text and recognize emotional states such as positive, negative, and neutral.

[0320] The server also uses this data to perform correlation analysis between regional datasets. This reveals how specific regional factors influence user sentiment and regional revitalization. The results are displayed visually on the device. Users can use the provided visual dashboard to intuitively understand the situation in their region.

[0321] Furthermore, the system uses time-series analysis to predict future regional trends. The information generated in real time serves as an important tool for users when considering future strategies.

[0322] As a concrete example, suppose a user uses a server to analyze the emotions of young people in a specific region and detects that positive emotions increase on weekends. Based on this, the server can suggest increasing the number of entertainment and cultural events in that region. In this way, the present invention provides a practical system that integrates regional data with user emotion information to support effective regional revitalization strategies.

[0323] The following describes the processing flow.

[0324] Step 1:

[0325] The server collects image data from multiple observation devices within a region. These devices are installed at specific locations and automatically capture images according to a pre-set schedule, then transfer them to the server.

[0326] Step 2:

[0327] Users conduct interviews with taxi drivers and local restaurant owners and send the audio recordings to the server. The server then uses speech recognition technology to convert the received audio data into text.

[0328] Step 3:

[0329] The server applies image recognition algorithms to the image data to automatically count the number of specific objects (e.g., people, cars, bicycles). The results are stored as digital records and used as indicators of activity in the area.

[0330] Step 4:

[0331] The server uses a sentiment engine to analyze text data. Natural language processing techniques are used to analyze keywords and context, classifying the user's sentiment as positive, negative, or neutral.

[0332] Step 5:

[0333] The server performs correlation analysis between the collected datasets. This identifies the relationship between sentiment data and local activities and factors. This analysis is important for extracting specific factors that influence users' sentiment in a given region.

[0334] Step 6:

[0335] The server visualizes the analysis results and displays them on the terminal in real time. The user's terminal can then use the generated dashboard to visually check the local situation and sentiment trends.

[0336] Step 7:

[0337] Based on information obtained from their devices, users consider and implement measures that are appropriate to the characteristics of the region and the emotional state of the users. For example, in a region where positive emotions are prevalent on weekends, they might suggest holding events or gatherings.

[0338] (Example 2)

[0339] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0340] In modern cities and regions, the increasing population density and complex transportation systems make it difficult to formulate appropriate regional revitalization strategies. Furthermore, the scattered nature of various data points leads to a lack of ability to integrate and effectively analyze this data to develop concrete proposals based on regional characteristics and trends. Currently, there is a demand for optimal strategic proposals that take into account temporal trends and people's emotions, but achieving this in real time is challenging.

[0341] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0342] In this invention, the server includes means for analyzing video information collected from observation equipment and counting the number of objects within a predetermined area; means for converting audio information into text information and classifying that text information based on emotional states; and means for analyzing the correlation between multiple sets of information and understanding the relationships between specific factors. This enables the integrated analysis of regional data and the formulation of concrete and effective regional revitalization strategies through real-time emotional analysis and information visualization.

[0343] "Observation equipment" refers to devices used to collect images and videos from various locations within a region.

[0344] "Visual information" refers to visual data captured by observation equipment, and this includes photographs and videos.

[0345] "Counting the quantity of objects" is the process of analyzing video information to identify the number of specific objects or people.

[0346] "Audio information" refers to data obtained from sound, including human conversations and ambient sounds.

[0347] "Converting to text information" is the process of converting audio information into text format.

[0348] "Classifying based on emotional state" is the process of analyzing textual information to identify whether its content corresponds to a positive, negative, or neutral emotion.

[0349] An "information collection" refers to a diverse group of collected data, which includes both visual and textual information.

[0350] "Analyzing correlations" is the process of analyzing the relationships between sets of information and clarifying the relationships between specific factors.

[0351] A "regional revitalization strategy" refers to specific plans and proposals aimed at regional development and improving the lives of residents.

[0352] In the embodiment for carrying out the invention, a server-centered system is used. This system enables detailed data collection, analysis, visualization, and strategic proposals.

[0353] The server first collects video information from multiple locations in the region using observation equipment. This includes high-resolution cameras and other sensors. The collected video information is processed by image recognition algorithms (e.g., TensorFlow and OpenCV), known as analysis software, to count specific objects. The data obtained from this analysis is used for urban traffic volume analysis and understanding pedestrian flow.

[0354] To further enhance local information, users input audio data acquired from the site into the system. This audio data is converted into text by a server using speech recognition software (e.g., a speech recognition API). This text data is then processed through a sentiment analysis engine and classified based on emotional states. This process utilizes software employing natural language processing techniques (e.g., NLTK or Transformers).

[0355] Furthermore, the server integrates and analyzes these different datasets. Statistical software is used, particularly to analyze correlations and uncover the relationships between specific factors. The results of this analysis are provided through immediate data visualization on the terminal, displayed using dedicated dashboards (e.g., Tableau or Power BI). This makes it easier for users to understand regional trends in real time.

[0356] In future strategic proposals, time-series data analysis models will be used to predict regional changes. This will allow users to obtain foundational information for making strategic decisions in areas such as transportation, urban development, and social activities.

[0357] As a concrete example, when a user uses this system to analyze pedestrian traffic in a commercial district on a holiday, the system can identify congestion on specific streets on weekends based on data trends. As a suggestion, the server can recommend holding pedestrian-priority events in this area. An example of a prompt generated by the AI ​​model is, "Based on past data, identify the main causes of negative sentiment in the area and suggest ways to improve it." In this way, the present invention provides a practical method for revitalizing local communities through complex data analysis.

[0358] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0359] Step 1:

[0360] The server receives video information collected from observation equipment. This video information is collected from various locations in the region. The server receives this information as input, uses image recognition technology to identify specific objects, and counts their quantities. Specifically, the server uses TensorFlow to detect objects such as people and cars, and outputs the results to a database.

[0361] Step 2:

[0362] The user acquires audio information and uploads it to the system. The server receives this audio information as input and converts it into text using speech recognition software. The resulting text information is then used as specific data for sentiment analysis, and the server prepares this as output.

[0363] Step 3:

[0364] The server uses the acquired text information as input to activate its sentiment analysis engine. This engine uses natural language processing techniques to classify emotional states (positive, negative, neutral) from the text information. Specifically, the server extracts emotional keywords, determines the emotional category based on these keywords, and outputs the result.

[0365] Step 4:

[0366] The server integrates all datasets and analyzes their correlations. The input consists of multiple datasets, and the output shows the relationships between specific factors. The server uses statistical methods to analyze this and reveal specific patterns and trends.

[0367] Step 5:

[0368] The server uses the analysis results to generate a dataset for visualization on the terminal and outputs it to a visualization tool. Specifically, the server provides data to Tableau or Power BI, and displays the results visually using graphs and charts.

[0369] Step 6:

[0370] The server performs time series analysis to predict future regional trends. This uses historical data as input and generates predictions of future trends as output. Specifically, the server applies a time series model to create data showing, for example, the expected increase or decrease in traffic volume in the following month.

[0371] Step 7:

[0372] The server uses the generated information to construct specific strategies in order to propose regional development plans to the user. In this process, the generating AI model outputs strategic proposals based on prompts. An example of a prompt is, "Predict the expected emotional trends of people at the next event and propose the optimal marketing strategy."

[0373] (Application Example 2)

[0374] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0375] In modern urban environments, it is essential to understand residents' sentiments and local trends in real time and develop effective strategies based on that understanding. However, traditional methods lacked systems capable of instantly analyzing vast amounts of data and presenting the results visually, making rapid decision-making difficult. In particular, proposals for regional revitalization were often not intuitive, requiring users to understand their importance and take action.

[0376] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0377] In this invention, the server includes means for analyzing image information collected from observation equipment to count the number of specific objects within a region, means for converting audio information obtained from listening into text information and classifying that text information based on emotion, and means for using a computer system that provides an intuitive interface for users to understand proposed policies and take new actions. This enables real-time understanding of the local situation and quick, clear decision-making.

[0378] "Observation equipment" refers to devices that collect image information from various locations in a region.

[0379] "Analysis" is the process of examining collected information and extracting specific patterns or meaningful data.

[0380] "Object" refers to an object or event that is specifically identified as the subject of observation.

[0381] "Audio information" refers to data collected from the voices of one or more people.

[0382] "Textual information" refers to data that represents audio information as text.

[0383] "Emotion-based classification" is a process that uses natural language processing techniques to analyze text data and categorize it into emotional states such as positive, negative, and neutral.

[0384] An "interface" refers to the display screen or input method that a user uses to interact with a computer system.

[0385] A "computer system" is a system consisting of a series of devices and software that process and analyze data.

[0386] "Immediate" refers to a situation where there is virtually no time required from information collection to display.

[0387] "Visualization" refers to displaying data as graphs or diagrams to make information easier to understand intuitively.

[0388] A "regional revitalization policy" is a plan or strategy that outlines how to make a specific region more vibrant.

[0389] "Users" refers to those who operate this system and receive data and suggestion information.

[0390] The server is configured as follows to implement this invention. First, multiple cameras and sensors are installed in the area as observation equipment to collect image data in real time. This data is analyzed using the open-source image processing library OpenCV to detect specific objects and count their numbers. The results of this image analysis are used to understand the flow of people and traffic volume within the area.

[0391] Simultaneously, the audio information collected from the user is converted into text using a cloud-based speech recognition API. The converted text is then subjected to sentiment analysis using a natural language processing library (such as NLTK). This sentiment analysis determines whether the user's statements are positive, negative, or neutral.

[0392] The server performs correlation analysis between these multiple datasets to identify specific causal relationships. The results of this analysis are visualized on the computer system in a way that is intuitively understandable to users and displayed on the user's terminal in real time. Based on this information, users can consider regional revitalization strategies. Users are provided with an intuitive interface and assistance in easily formulating new action plans.

[0393] As a concrete example, this system could be used during a music event in a certain city to detect positive emotional tendencies among a large number of residents, and this information could be used to propose strategies for further improving the event the following year.

[0394] An example of a prompt using a generative AI model is: "Analyze the impact of local music events on residents' emotions and generate specific strategic proposals."

[0395] In this way, the system enables efficient analysis of complex data and provides users with effective information for regional revitalization.

[0396] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0397] Step 1:

[0398] The server collects regional image data in real time from observation equipment. The input is raw image data from cameras and sensors, and the output is image data converted into an analyzable format. To achieve this, image resolution and format conversion are performed in the initial stages.

[0399] Step 2:

[0400] The server uses OpenCV to analyze the collected image data, detect specific objects, and count their numbers. The input is the transformed image data obtained in step 1, and the output is data about the type and number of objects. Specifically, it applies an object detection algorithm and updates the database with the recognized objects.

[0401] Step 3:

[0402] The server converts the voice information provided by the user into text using a speech recognition API. The input is voice data, and the output is the converted string data. Specifically, the speech recognition API analyzes the frequency components of the voice and generates the corresponding string.

[0403] Step 4:

[0404] The server performs sentiment analysis on text data using natural language processing techniques. The input is the text data generated in step 3, and the output is an emotional state such as positive, negative, or neutral. In this step, keywords within the text are analyzed and an emotional score is calculated.

[0405] Step 5:

[0406] The server investigates the correlation between image analysis data and sentiment data from text, and identifies the relationships. The input is the dataset obtained in steps 2 and 4, and the output is the analysis results showing causal relationships. Specifically, a statistical model is used to calculate the correlation between the data.

[0407] Step 6:

[0408] The terminal visualizes the analysis results sent from the server and displays them to the user. The input is the analysis data obtained in step 5, and the output is a graphical display on the user interface. In this step, graphs and dashboards are generated based on the data and presented in a way that the user can intuitively understand.

[0409] Step 7:

[0410] Users formulate and implement regional revitalization policies through the provided interface. The input is visualized analysis results, and the output is a new action plan. In the final step, the system supports users in navigating the interface, formulating the new plan, and putting it into practice.

[0411] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0412] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0413] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0414] [Third Embodiment]

[0415] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0416] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0417] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0418] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0419] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0420] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0421] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0422] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0423] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0424] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0425] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0426] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0427] This invention relates to a system that acquires observational and interview data from various data sources, analyzes it to evaluate the local situation, and provides effective revitalization strategies.

[0428] This system is centered around a server that collects data from multiple observation devices. These observation devices include cameras installed in specific locations, such as train station plazas or residential areas, which periodically photograph traffic conditions and record them as image data. Opinions gathered by users from taxi drivers and others are also sent to the server. The server receives this audio data and automatically converts it into text information.

[0429] Next, the server analyzes the acquired image data to identify specific objects and elements. For example, it counts the number of people and bicycles from an image of the area in front of a train station to understand the flow of people during peak hours. Image recognition technology is used for this purpose. The server also uses natural language processing technology to analyze the sentiment of text data converted from speech, classifying it as positive, negative, or neutral. This makes it possible to accurately understand and analyze various voices within the community.

[0430] Furthermore, the server performs correlation analysis between different datasets. For example, it identifies the relationship between data on the use of commercial facilities around train stations and data on the transportation use of local residents. Such analysis results play a crucial role in understanding regional characteristics and formulating effective strategies.

[0431] The analysis results are immediately visualized and provided to the user via their device. The real-time generated dashboard allows users to intuitively understand the local situation. For example, they can view graphs showing weekend traffic fluctuations and pedestrian flow.

[0432] As a concrete example, suppose an analysis of pedestrian traffic in front of a train station in a regional city reveals a significant difference between weekdays and weekends. In this case, the user (local government official) can then formulate a policy to strengthen weekend transportation infrastructure. This makes it possible to develop investment plans and public policies that accurately reflect the local situation.

[0433] Thus, the system of the present invention can grasp the characteristics of a region in detail through multifaceted analysis of collected data and propose appropriate revitalization strategies.

[0434] The following describes the processing flow.

[0435] Step 1:

[0436] The server collects image data in real time from the observation equipment. The image data is acquired by cameras installed in specific areas such as in front of train stations and residential areas, and then transferred to the server.

[0437] Step 2:

[0438] The server receives audio data from users who have performed the listening test. This audio data is collected from people such as taxi drivers and restaurant owners, and the server performs a speech recognition process to convert it into text data.

[0439] Step 3:

[0440] The server applies image recognition technology to the collected image data to detect and count specific objects (e.g., people, cars, bicycles, etc.). This allows for the quantification of pedestrian flow and traffic volume in specific areas.

[0441] Step 4:

[0442] The server analyzes the text data converted by speech recognition using natural language processing (NLP) and classifies the sentiment. The text is classified as positive, negative, or neutral, and this data is used to capture the sentiments of local residents.

[0443] Step 5:

[0444] The server performs correlation analysis between different datasets. It analyzes interactions within a region, such as showing the relationship between local transportation usage data and commercial facility usage data.

[0445] Step 6:

[0446] The server generates the analysis results as a visual dashboard and sends it to the terminal. The terminal provides a user-friendly interface, allowing the results to be viewed in real time.

[0447] Step 7:

[0448] Users can review the dashboard provided on their device and develop strategies based on local conditions and trends. For example, they can consider ways to improve transportation infrastructure during specific time periods.

[0449] (Example 1)

[0450] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0451] In modern local communities, it is crucial to understand the situation from multiple perspectives and formulate appropriate measures in order to revitalize the region. However, traditional methods have presented challenges in efficiently and comprehensively analyzing the vast amount of information obtained from multiple data sources and providing that information to users intuitively and immediately.

[0452] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0453] In this invention, the server includes means for analyzing image information collected from observation devices to measure the number of specific objects within a region, means for converting acquired audio information into text information and classifying that text information based on emotion, and means for performing correlation analysis between multiple information sets to identify relationships between specific elements. This makes it possible to analyze the situation in a region from multiple perspectives, visualize it immediately, and propose revitalization strategies.

[0454] An "observation device" is a device installed in a specific location to monitor the situation within that area, and is used to collect image information and other data.

[0455] "Image information" refers to visual information collected by observation devices that shows objects and conditions within a region.

[0456] "Audio information" refers to data collected from users and local residents in the form of spoken language.

[0457] "Textual information" refers to data in text format that is obtained by converting audio information using speech recognition technology.

[0458] "Classification based on emotion" is the process of analyzing textual information and determining whether its content is positive, negative, or neutral.

[0459] Correlation analysis is the process of analyzing the relationships and patterns between elements in multiple sets of information using statistical methods.

[0460] "Visualization" is a method of displaying data and analysis results in a format that can be intuitively understood, using images, graphs, charts, and other visual aids.

[0461] A "prompt" is text used to give specific instructions or questions to a generative AI model.

[0462] A "generative AI model" is a system that uses artificial intelligence technology to generate information and suggestions in a human-understandable format based on a given prompt.

[0463] This invention is a system for comprehensively understanding the situation in a local community and formulating more effective revitalization strategies. The system mainly consists of a server, observation devices, and user terminals.

[0464] The server analyzes image information collected from observation devices. These observation devices are installed in specific areas and include cameras that capture various scenes within the area. The accumulated image information is then analyzed on the server using image recognition technologies (such as OpenCV and TensorFlow) to measure specific objects or elements, such as the number of people or bicycles.

[0465] Simultaneously, the server converts the audio information obtained from the user into text information. The audio data is converted into text format using speech recognition software (such as Google Cloud Speech-to-Text). This text information is then subjected to sentiment analysis using natural language processing technology (such as NLTK or SpaCy) and classified into three categories: positive, negative, and neutral.

[0466] Furthermore, the server performs correlation analysis using image information, text information, and other collected data. To clarify the relationships between the data, it uses the Python Pandas library to calculate the correlation coefficient of the numerical data.

[0467] The analysis results are visualized in a dashboard format using visualization libraries such as Plotly and Matplotlib. The terminal provides this information to the user in real time, offering a means to intuitively understand the current situation and trends in the region.

[0468] As a concrete example, consider an analysis of pedestrian traffic in front of train stations in regional cities, comparing weekday and weekend traffic trends. Based on this information, local government officials (the users) can formulate improvement measures for weekend transportation infrastructure.

[0469] To make even better use of this system, users input prompts into the generated AI model. An example prompt might be, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends." Based on these prompts, the AI ​​generates optimal suggestions based on the data.

[0470] Therefore, this system is extremely effective in gaining a deeper understanding of regional characteristics and formulating concrete and feasible revitalization policies.

[0471] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0472] Step 1:

[0473] The server receives image information from the observation device. A camera is used as the observation device, and image data captured by the camera within the area is input. The server acquires this image data and identifies objects using image recognition technology (such as OpenCV or TensorFlow). Specifically, it measures the number of people and bicycles in the image and outputs the results as numerical data. This data is used as basic information to understand the traffic volume and pedestrian flow in the area.

[0474] Step 2:

[0475] Users provide audio information they have gathered from local residents and taxi drivers to a server. The server receives this audio data as input and converts it into text using speech recognition software (such as Google Cloud Speech-to-Text). The converted text is then subjected to natural language processing technologies (such as NLTK and SpaCy) to analyze the sentiment contained within it. This allows the text to be classified as positive, negative, or neutral, providing a means to gain a concrete understanding of the voices of local residents.

[0476] Step 3:

[0477] The server takes numerical and text data obtained from images as input and performs correlation analysis between multiple datasets. It utilizes the Python Pandas library to calculate correlation coefficients between these data points. For example, it can output how much the utilization rate of commercial facilities around a train station affects traffic volume during a specific time period. This correlation analysis provides a concrete understanding of how regional economic activity and traffic trends are related.

[0478] Step 4:

[0479] The server performs the steps to visualize the analysis results. Using visualization libraries such as Plotly and Matplotlib, it generates output in a dashboard format. This dashboard is provided to the user via their terminal, displaying the analysis results in graphs and charts. The user can use this to perform a detailed analysis of the current situation and understand trends.

[0480] Step 5:

[0481] The user inputs prompts into the generating AI model. For example, a prompt such as, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends," might be used. Based on these prompts, the AI ​​generates data-driven proposals and outputs them. This provides concrete insights that can contribute to the formulation of regional revitalization strategies.

[0482] (Application Example 1)

[0483] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0484] Currently, there is a lack of efficient means to collect and analyze local data and provide citizens with the information necessary for regional revitalization in real time. As a result, it takes time to understand the local situation, which can delay the planning and implementation of effective revitalization strategies. Furthermore, there is a challenge in that it is difficult to integrate information from diverse data sources, and useful information is not being appropriately provided to users.

[0485] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0486] In this invention, the server includes means for analyzing image data collected from observation devices to count the number of specific objects in a region, means for converting audio data obtained from listening into text data and classifying the text data based on emotion, means for performing correlation analysis between multiple datasets to identify relationships between specific factors, and means for providing regional information in real time based on the user's location information. This enables a rapid and comprehensive understanding of the regional situation, allowing for timely information provision to users and proposals for regional revitalization strategies.

[0487] An "observation device" is a device installed at a specific location within a region to collect image data.

[0488] "Image data analysis" refers to a method of processing image data obtained from observation equipment to count the number of specific objects or people.

[0489] "Audio data conversion" is the process of automatically converting audio data obtained from listening into text data.

[0490] "Sentiment analysis" is a technique that analyzes text data and classifies its content based on emotions such as positive, negative, and neutral.

[0491] Correlation analysis is a technique that identifies relationships between multiple datasets and reveals correlations between specific factors.

[0492] "Real-time visualization" is a technology that instantly displays acquired data visually, allowing users to intuitively understand the situation.

[0493] A "regional revitalization strategy" is a set of specific proposals and plans based on acquired data to revitalize the local economy and living environment.

[0494] "Location-based local information provision" is a function that provides the latest information related to the surrounding area in real time, based on the user's current location.

[0495] The system implementing this invention performs data collection, analysis, visualization, and information provision. The server analyzes image data acquired from observation devices and performs calculations to identify the number of specific objects within a region. This uses image recognition software such as OpenCV and TensorFlow. The server also converts voice data obtained from taxi drivers and others into text data using the Google Cloud Natural Language API and performs sentiment analysis. This analysis classifies the data as positive, negative, or neutral.

[0496] The collected data is stored in MongoDB, and correlation and time-series analyses are performed using Python scripts. This analysis, based on data from observation devices and user location information, is particularly useful for predicting pedestrian flow and congestion levels within a region. Furthermore, these results are visualized in real-time on a smartphone application using React Native.

[0497] Users with a device can receive location-based information on nearby traffic conditions and events. For example, peak times for pedestrian traffic in a region on weekends can be predicted, and notifications can be sent to help users choose appropriate modes of transportation. This allows users to avoid congestion in their area and travel efficiently.

[0498] As a concrete example, a prompt message for a generative AI model might be in the format of, "Please provide a summary of the traffic situation in front of XX Station yesterday. What time of day was the most congested?" Such prompt messages allow users to obtain specific analytical results and predictive information based on past data.

[0499] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0500] Step 1:

[0501] The server acquires image data from observation equipment. This includes video from cameras covering a specific area within the region. The server receives this image data as input and uses OpenCV and TensorFlow to identify and count objects within the image. The output is data indicating the number of specific objects.

[0502] Step 2:

[0503] The server collects audio data from the source of the listening. Conversations between users, such as with a taxi driver, are sent to the server as audio data. This audio data is converted to text using the Google Cloud Natural Language API. The converted text data is then used as input for sentiment analysis, and the results are output, categorized as positive, negative, or neutral.

[0504] Step 3:

[0505] The server aggregates historical and current data stored in MongoDB and performs correlation analysis and time series analysis using Python. The input for this analysis is numerical data obtained from image data and sentiment data obtained from text data. The analysis results are used to evaluate the movement of people and the impact of events within a region, and regularities and predictions are output.

[0506] Step 4:

[0507] The smartphone, acting as the terminal, receives real-time analysis results from the server. The user's location information is sent to the server as input, and regional information and congestion predictions are displayed on the user's terminal. The visualized results are provided to the user using an application built with React Native.

[0508] Step 5:

[0509] The user inputs a prompt message into the AI ​​model generated within the app. For example, suppose the user sends the text, "Please tell me about the current traffic conditions." The server receives this prompt message, generates an answer based on past data and analysis results, and outputs it to the user.

[0510] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0511] This invention relates to a system that combines data collection from observation devices, text conversion of audio data, correlation analysis between multiple data sets, data visualization, and strategy proposal with an emotion engine that recognizes user emotions.

[0512] This system is server-centric and collects image data from various locations in the region through observation devices. Users provide the system with audio data they have heard from people such as taxi drivers and restaurant owners, and the server is responsible for converting that audio data into text.

[0513] The server uses image recognition technology to identify specific objects from image data and count their number. The identified data is then used as foundational data for specific calculations of local pedestrian and traffic flow and other related data.

[0514] In addition, the server utilizes an emotion engine to analyze the user's emotions from the converted text data. The emotion engine uses natural language processing techniques to analyze keywords and context within the text and recognize emotional states such as positive, negative, and neutral.

[0515] The server also uses this data to perform correlation analysis between regional datasets. This reveals how specific regional factors influence user sentiment and regional revitalization. The results are displayed visually on the device. Users can use the provided visual dashboard to intuitively understand the situation in their region.

[0516] Furthermore, the system uses time-series analysis to predict future regional trends. The information generated in real time serves as an important tool for users when considering future strategies.

[0517] As a concrete example, suppose a user uses a server to analyze the emotions of young people in a specific region and detects that positive emotions increase on weekends. Based on this, the server can suggest increasing the number of entertainment and cultural events in that region. In this way, the present invention provides a practical system that integrates regional data with user emotion information to support effective regional revitalization strategies.

[0518] The following describes the processing flow.

[0519] Step 1:

[0520] The server collects image data from multiple observation devices within a region. These devices are installed at specific locations and automatically capture images according to a pre-set schedule, then transfer them to the server.

[0521] Step 2:

[0522] Users conduct interviews with taxi drivers and local restaurant owners and send the audio recordings to the server. The server then uses speech recognition technology to convert the received audio data into text.

[0523] Step 3:

[0524] The server applies image recognition algorithms to the image data to automatically count the number of specific objects (e.g., people, cars, bicycles). The results are stored as digital records and used as indicators of activity in the area.

[0525] Step 4:

[0526] The server uses a sentiment engine to analyze text data. Natural language processing techniques are used to analyze keywords and context, classifying the user's sentiment as positive, negative, or neutral.

[0527] Step 5:

[0528] The server performs correlation analysis between the collected datasets. This identifies the relationship between sentiment data and local activities and factors. This analysis is important for extracting specific factors that influence users' sentiment in a given region.

[0529] Step 6:

[0530] The server visualizes the analysis results and displays them on the terminal in real time. The user's terminal can then use the generated dashboard to visually check the local situation and sentiment trends.

[0531] Step 7:

[0532] Based on information obtained from their devices, users consider and implement measures that are appropriate to the characteristics of the region and the emotional state of the users. For example, in a region where positive emotions are prevalent on weekends, they might suggest holding events or gatherings.

[0533] (Example 2)

[0534] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0535] In modern cities and regions, the increasing population density and complex transportation systems make it difficult to formulate appropriate regional revitalization strategies. Furthermore, the scattered nature of various data points leads to a lack of ability to integrate and effectively analyze this data to develop concrete proposals based on regional characteristics and trends. Currently, there is a demand for optimal strategic proposals that take into account temporal trends and people's emotions, but achieving this in real time is challenging.

[0536] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0537] In this invention, the server includes means for analyzing video information collected from observation equipment and counting the number of objects within a predetermined area; means for converting audio information into text information and classifying that text information based on emotional states; and means for analyzing the correlation between multiple sets of information and understanding the relationships between specific factors. This enables the integrated analysis of regional data and the formulation of concrete and effective regional revitalization strategies through real-time emotional analysis and information visualization.

[0538] "Observation equipment" refers to devices used to collect images and videos from various locations within a region.

[0539] "Visual information" refers to visual data captured by observation equipment, and this includes photographs and videos.

[0540] "Counting the quantity of objects" is the process of analyzing video information to identify the number of specific objects or people.

[0541] "Audio information" refers to data obtained from sound, including human conversations and ambient sounds.

[0542] "Converting to text information" is the process of converting audio information into text format.

[0543] "Classifying based on emotional state" is the process of analyzing textual information to identify whether its content corresponds to a positive, negative, or neutral emotion.

[0544] An "information set" refers to a diverse collection of data, which includes both visual and textual information.

[0545] "Analyzing correlations" is the process of analyzing the relationships between sets of information and clarifying the relationships between specific factors.

[0546] A "regional revitalization strategy" refers to specific plans and proposals aimed at regional development and improving the lives of residents.

[0547] In the embodiment for carrying out the invention, a server-centered system is used. This system enables detailed data collection, analysis, visualization, and strategic proposals.

[0548] The server first collects video information from multiple locations in the region using observation equipment. This includes high-resolution cameras and other sensors. The collected video information is processed by image recognition algorithms (e.g., TensorFlow and OpenCV), known as analysis software, to count specific objects. The data obtained from this analysis is used for urban traffic volume analysis and understanding pedestrian flow.

[0549] To further enhance local information, users input audio data acquired from the site into the system. This audio data is converted into text by a server using speech recognition software (e.g., a speech recognition API). This text data is then processed through a sentiment analysis engine and classified based on emotional states. This process utilizes software employing natural language processing techniques (e.g., NLTK or Transformers).

[0550] Furthermore, the server integrates and analyzes these different datasets. Statistical software is used, particularly to analyze correlations and uncover the relationships between specific factors. The results of this analysis are provided through immediate data visualization on the terminal, displayed using dedicated dashboards (e.g., Tableau or Power BI). This makes it easier for users to understand regional trends in real time.

[0551] In future strategic proposals, time-series data analysis models will be used to predict regional changes. This will allow users to obtain foundational information for making strategic decisions in areas such as transportation, urban development, and social activities.

[0552] As a concrete example, when a user uses this system to analyze pedestrian traffic in a commercial district on a holiday, the system can identify congestion on specific streets on weekends based on data trends. As a suggestion, the server can recommend holding pedestrian-priority events in this area. An example of a prompt generated by the AI ​​model is, "Based on past data, identify the main causes of negative sentiment in the area and suggest ways to improve it." In this way, the present invention provides a practical method for revitalizing local communities through complex data analysis.

[0553] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0554] Step 1:

[0555] The server receives video information collected from observation equipment. This video information is collected from various locations in the region. The server receives this information as input, uses image recognition technology to identify specific objects, and counts their quantities. Specifically, the server uses TensorFlow to detect objects such as people and cars, and outputs the results to a database.

[0556] Step 2:

[0557] The user acquires audio information and uploads it to the system. The server receives this audio information as input and converts it into text using speech recognition software. The resulting text information is then used as specific data for sentiment analysis, and the server prepares this as output.

[0558] Step 3:

[0559] The server uses the acquired text information as input to activate its sentiment analysis engine. This engine uses natural language processing techniques to classify emotional states (positive, negative, neutral) from the text information. Specifically, the server extracts emotional keywords, determines the emotional category based on these keywords, and outputs the result.

[0560] Step 4:

[0561] The server integrates all datasets and analyzes their correlations. The input consists of multiple datasets, and the output shows the relationships between specific factors. The server uses statistical methods to analyze this and reveal specific patterns and trends.

[0562] Step 5:

[0563] The server uses the analysis results to generate a dataset for visualization on the terminal and outputs it to a visualization tool. Specifically, the server provides data to Tableau or Power BI, and displays the results visually using graphs and charts.

[0564] Step 6:

[0565] The server performs time series analysis to predict future regional trends. This uses historical data as input and generates predictions of future trends as output. Specifically, the server applies a time series model to create data showing, for example, the expected increase or decrease in traffic volume in the following month.

[0566] Step 7:

[0567] The server uses the generated information to construct specific strategies in order to propose regional development plans to the user. In this process, the generating AI model outputs strategic proposals based on prompts. An example of a prompt is, "Predict the expected emotional trends of people at the next event and propose the optimal marketing strategy."

[0568] (Application Example 2)

[0569] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0570] In modern urban environments, it is essential to understand residents' sentiments and local trends in real time and develop effective strategies based on that understanding. However, traditional methods lacked systems capable of instantly analyzing vast amounts of data and presenting the results visually, making rapid decision-making difficult. In particular, proposals for regional revitalization were often not intuitive, requiring users to understand their importance and take action.

[0571] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0572] In this invention, the server includes means for analyzing image information collected from observation equipment to count the number of specific objects within a region, means for converting audio information obtained from listening into text information and classifying that text information based on emotion, and means for using a computer system that provides an intuitive interface for users to understand proposed policies and take new actions. This enables real-time understanding of the local situation and quick, clear decision-making.

[0573] "Observation equipment" refers to devices that collect image information from various locations in a region.

[0574] "Analysis" is the process of examining collected information and extracting specific patterns or meaningful data.

[0575] "Object" refers to an object or event that is specifically identified as the subject of observation.

[0576] "Audio information" refers to data collected from the voices of one or more people.

[0577] "Textual information" refers to data that represents audio information as text.

[0578] "Emotion-based classification" is a process that uses natural language processing techniques to analyze text data and categorize it into emotional states such as positive, negative, and neutral.

[0579] An "interface" refers to the display screen or input method that a user uses to interact with a computer system.

[0580] A "computer system" is a system consisting of a series of devices and software that process and analyze data.

[0581] "Immediate" refers to a situation where there is virtually no time required from information collection to display.

[0582] "Visualization" refers to displaying data as graphs or diagrams to make information easier to understand intuitively.

[0583] A "regional revitalization policy" is a plan or strategy that outlines how to make a specific region more vibrant.

[0584] "Users" refers to those who operate this system and receive data and suggestion information.

[0585] The server is configured as follows to implement this invention. First, multiple cameras and sensors are installed in the area as observation equipment to collect image data in real time. This data is analyzed using the open-source image processing library OpenCV to detect specific objects and count their numbers. The results of this image analysis are used to understand the flow of people and traffic volume within the area.

[0586] Simultaneously, the audio information collected from the user is converted into text using a cloud-based speech recognition API. The converted text is then subjected to sentiment analysis using a natural language processing library (such as NLTK). This sentiment analysis determines whether the user's statements are positive, negative, or neutral.

[0587] The server performs correlation analysis between these multiple datasets to identify specific causal relationships. The results of this analysis are visualized on the computer system in a way that is intuitively understandable to users and displayed on the user's terminal in real time. Based on this information, users can consider regional revitalization strategies. Users are provided with an intuitive interface and assistance in easily formulating new action plans.

[0588] As a concrete example, this system could be used during a music event in a certain city to detect positive emotional tendencies among a large number of residents, and this information could be used to propose strategies for further improving the event the following year.

[0589] An example of a prompt using a generative AI model is: "Analyze the impact of local music events on residents' emotions and generate specific strategic proposals."

[0590] In this way, the system enables efficient analysis of complex data and provides users with effective information for regional revitalization.

[0591] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0592] Step 1:

[0593] The server collects regional image data in real time from observation equipment. The input is raw image data from cameras and sensors, and the output is image data converted into an analyzable format. To achieve this, image resolution and format conversion are performed in the initial stages.

[0594] Step 2:

[0595] The server uses OpenCV to analyze the collected image data, detect specific objects, and count their numbers. The input is the transformed image data obtained in step 1, and the output is data about the type and number of objects. Specifically, it applies an object detection algorithm and updates the database with the recognized objects.

[0596] Step 3:

[0597] The server converts the voice information provided by the user into text using a speech recognition API. The input is voice data, and the output is the converted string data. Specifically, the speech recognition API analyzes the frequency components of the voice and generates the corresponding string.

[0598] Step 4:

[0599] The server performs sentiment analysis on text data using natural language processing techniques. The input is the text data generated in step 3, and the output is an emotional state such as positive, negative, or neutral. In this step, keywords within the text are analyzed and an emotional score is calculated.

[0600] Step 5:

[0601] The server investigates the correlation between image analysis data and sentiment data from text, and identifies the relationships. The input is the dataset obtained in steps 2 and 4, and the output is the analysis results showing causal relationships. Specifically, a statistical model is used to calculate the correlation between the data.

[0602] Step 6:

[0603] The terminal visualizes the analysis results sent from the server and displays them to the user. The input is the analysis data obtained in step 5, and the output is a graphical display on the user interface. In this step, graphs and dashboards are generated based on the data and presented in a way that the user can intuitively understand.

[0604] Step 7:

[0605] Users formulate and implement regional revitalization policies through the provided interface. The input is visualized analysis results, and the output is a new action plan. In the final step, the system supports users in navigating the interface, formulating the new plan, and putting it into practice.

[0606] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0607] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0608] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0609] [Fourth Embodiment]

[0610] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0611] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0612] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0613] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0614] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0615] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0616] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0617] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0618] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0619] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0620] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0621] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0622] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0623] This invention relates to a system that acquires observational and interview data from various data sources, analyzes it to evaluate the local situation, and provides effective revitalization strategies.

[0624] This system is centered around a server that collects data from multiple observation devices. These observation devices include cameras installed in specific locations, such as train station plazas or residential areas, which periodically photograph traffic conditions and record them as image data. Opinions gathered by users from taxi drivers and others are also sent to the server. The server receives this audio data and automatically converts it into text information.

[0625] Next, the server analyzes the acquired image data to identify specific objects and elements. For example, it counts the number of people and bicycles from an image of the area in front of a train station to understand the flow of people during peak hours. Image recognition technology is used for this purpose. The server also uses natural language processing technology to analyze the sentiment of text data converted from speech, classifying it as positive, negative, or neutral. This makes it possible to accurately understand and analyze various voices within the community.

[0626] Furthermore, the server performs correlation analysis between different datasets. For example, it identifies the relationship between data on the use of commercial facilities around train stations and data on the transportation use of local residents. Such analysis results play a crucial role in understanding regional characteristics and formulating effective strategies.

[0627] The analysis results are immediately visualized and provided to the user via their device. The real-time generated dashboard allows users to intuitively understand the local situation. For example, they can view graphs showing weekend traffic fluctuations and pedestrian flow.

[0628] As a concrete example, suppose an analysis of pedestrian traffic in front of a train station in a regional city reveals a significant difference between weekdays and weekends. In this case, the user (local government official) can then formulate a policy to strengthen weekend transportation infrastructure. This makes it possible to develop investment plans and public policies that accurately reflect the local situation.

[0629] Thus, the system of the present invention can grasp the characteristics of a region in detail through multifaceted analysis of collected data and propose appropriate revitalization strategies.

[0630] The following describes the processing flow.

[0631] Step 1:

[0632] The server collects image data in real time from the observation equipment. The image data is acquired by cameras installed in specific areas such as in front of train stations and residential areas, and then transferred to the server.

[0633] Step 2:

[0634] The server receives audio data from users who have performed the listening test. This audio data is collected from people such as taxi drivers and restaurant owners, and the server performs a speech recognition process to convert it into text data.

[0635] Step 3:

[0636] The server applies image recognition technology to the collected image data to detect and count specific objects (e.g., people, cars, bicycles, etc.). This allows for the quantification of pedestrian flow and traffic volume in specific areas.

[0637] Step 4:

[0638] The server analyzes the text data converted by speech recognition using natural language processing (NLP) and classifies the sentiment. The text is classified as positive, negative, or neutral, and this data is used to capture the sentiments of local residents.

[0639] Step 5:

[0640] The server performs correlation analysis between different datasets. It analyzes interactions within a region, such as showing the relationship between local transportation usage data and commercial facility usage data.

[0641] Step 6:

[0642] The server generates the analysis results as a visual dashboard and sends it to the terminal. The terminal provides a user-friendly interface, allowing the results to be viewed in real time.

[0643] Step 7:

[0644] Users can review the dashboard provided on their device and develop strategies based on local conditions and trends. For example, they can consider ways to improve transportation infrastructure during specific time periods.

[0645] (Example 1)

[0646] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0647] In modern local communities, it is crucial to understand the situation from multiple perspectives and formulate appropriate measures in order to revitalize the region. However, traditional methods have presented challenges in efficiently and comprehensively analyzing the vast amount of information obtained from multiple data sources and providing that information to users intuitively and immediately.

[0648] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0649] In this invention, the server includes means for analyzing image information collected from observation devices to measure the number of specific objects within a region, means for converting acquired audio information into text information and classifying that text information based on emotion, and means for performing correlation analysis between multiple information sets to identify relationships between specific elements. This makes it possible to analyze the situation in a region from multiple perspectives, visualize it immediately, and propose revitalization strategies.

[0650] An "observation device" is a device installed in a specific location to monitor the situation within that area, and is used to collect image information and other data.

[0651] "Image information" refers to visual information collected by observation devices that shows objects and conditions within a region.

[0652] "Audio information" refers to data collected from users and local residents in the form of spoken language.

[0653] "Textual information" refers to data in text format that is obtained by converting audio information using speech recognition technology.

[0654] "Classification based on emotion" is the process of analyzing textual information and determining whether its content is positive, negative, or neutral.

[0655] Correlation analysis is the process of analyzing the relationships and patterns between elements in multiple sets of information using statistical methods.

[0656] "Visualization" is a method of displaying data and analysis results in a format that can be intuitively understood, using images, graphs, charts, and other visual aids.

[0657] A "prompt" is text used to give specific instructions or questions to a generative AI model.

[0658] A "generative AI model" is a system that uses artificial intelligence technology to generate information and suggestions in a human-understandable format based on a given prompt.

[0659] This invention is a system for comprehensively understanding the situation in a local community and formulating more effective revitalization strategies. The system mainly consists of a server, observation devices, and user terminals.

[0660] The server analyzes image information collected from observation devices. These observation devices are installed in specific areas and include cameras that capture various scenes within the area. The accumulated image information is then analyzed on the server using image recognition technologies (such as OpenCV and TensorFlow) to measure specific objects or elements, such as the number of people or bicycles.

[0661] Simultaneously, the server converts the audio information obtained from the user into text information. The audio data is converted into text format using speech recognition software (such as Google Cloud Speech-to-Text). This text information is then subjected to sentiment analysis using natural language processing technology (such as NLTK or SpaCy) and classified into three categories: positive, negative, and neutral.

[0662] Furthermore, the server performs correlation analysis using image information, text information, and other collected data. To clarify the relationships between the data, it uses the Python Pandas library to calculate the correlation coefficient of the numerical data.

[0663] The analysis results are visualized in a dashboard format using visualization libraries such as Plotly and Matplotlib. The terminal provides this information to the user in real time, offering a means to intuitively understand the current situation and trends in the region.

[0664] As a concrete example, consider an analysis of pedestrian traffic in front of train stations in regional cities, comparing weekday and weekend traffic trends. Based on this information, local government officials (the users) can formulate improvement measures for weekend transportation infrastructure.

[0665] To make even better use of this system, users input prompts into the generated AI model. An example prompt might be, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends." Based on these prompts, the AI ​​generates optimal suggestions based on the data.

[0666] Therefore, this system is extremely effective in gaining a deeper understanding of regional characteristics and formulating concrete and feasible revitalization policies.

[0667] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0668] Step 1:

[0669] The server receives image information from the observation device. A camera is used as the observation device, and image data captured by the camera within the area is input. The server acquires this image data and identifies objects using image recognition technology (such as OpenCV or TensorFlow). Specifically, it measures the number of people and bicycles in the image and outputs the results as numerical data. This data is used as basic information to understand the traffic volume and pedestrian flow in the area.

[0670] Step 2:

[0671] Users provide audio information they have gathered from local residents and taxi drivers to a server. The server receives this audio data as input and converts it into text using speech recognition software (such as Google Cloud Speech-to-Text). The converted text is then subjected to natural language processing technologies (such as NLTK and SpaCy) to analyze the sentiment contained within it. This allows the text to be classified as positive, negative, or neutral, providing a means to gain a concrete understanding of the voices of local residents.

[0672] Step 3:

[0673] The server takes numerical and text data obtained from images as input and performs correlation analysis between multiple datasets. It utilizes the Python Pandas library to calculate correlation coefficients between these data points. For example, it can output how much the utilization rate of commercial facilities around a train station affects traffic volume during a specific time period. This correlation analysis provides a concrete understanding of how regional economic activity and traffic trends are related.

[0674] Step 4:

[0675] The server performs the steps to visualize the analysis results. Using visualization libraries such as Plotly and Matplotlib, it generates output in a dashboard format. This dashboard is provided to the user via their terminal, displaying the analysis results in graphs and charts. The user can use this to perform a detailed analysis of the current situation and understand trends.

[0676] Step 5:

[0677] The user inputs prompts into the generating AI model. For example, a prompt such as, "Analyze the flow of people in front of train stations in regional cities and propose infrastructure development plans that take into account the differences in traffic volume between weekdays and weekends," might be used. Based on these prompts, the AI ​​generates data-driven proposals and outputs them. This provides concrete insights that can contribute to the formulation of regional revitalization strategies.

[0678] (Application Example 1)

[0679] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0680] Currently, there is a lack of efficient means to collect and analyze local data and provide citizens with the information necessary for regional revitalization in real time. As a result, it takes time to understand the local situation, which can delay the planning and implementation of effective revitalization strategies. Furthermore, there is a challenge in that it is difficult to integrate information from diverse data sources, and useful information is not being appropriately provided to users.

[0681] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0682] In this invention, the server includes means for analyzing image data collected from observation devices to count the number of specific objects in a region, means for converting audio data obtained from listening into text data and classifying the text data based on emotion, means for performing correlation analysis between multiple datasets to identify relationships between specific factors, and means for providing regional information in real time based on the user's location information. This enables a rapid and comprehensive understanding of the regional situation, allowing for timely information provision to users and proposals for regional revitalization strategies.

[0683] An "observation device" is a device installed at a specific location within a region to collect image data.

[0684] "Image data analysis" refers to a method of processing image data obtained from observation equipment to count the number of specific objects or people.

[0685] "Audio data conversion" is the process of automatically converting audio data obtained from listening into text data.

[0686] "Sentiment analysis" is a technique that analyzes text data and classifies its content based on emotions such as positive, negative, and neutral.

[0687] Correlation analysis is a technique that identifies relationships between multiple datasets and reveals correlations between specific factors.

[0688] "Real-time visualization" is a technology that instantly displays acquired data visually, allowing users to intuitively understand the situation.

[0689] A "regional revitalization strategy" is a set of specific proposals and plans based on acquired data to revitalize the local economy and living environment.

[0690] "Location-based local information provision" is a function that provides the latest information related to the surrounding area in real time, based on the user's current location.

[0691] The system implementing this invention performs data collection, analysis, visualization, and information provision. The server analyzes image data acquired from observation devices and performs calculations to identify the number of specific objects within a region. This uses image recognition software such as OpenCV and TensorFlow. The server also converts voice data obtained from taxi drivers and others into text data using the Google Cloud Natural Language API and performs sentiment analysis. This analysis classifies the data as positive, negative, or neutral.

[0692] The collected data is stored in MongoDB, and correlation and time-series analyses are performed using Python scripts. This analysis, based on data from observation devices and user location information, is particularly useful for predicting pedestrian flow and congestion levels within a region. Furthermore, these results are visualized in real-time on a smartphone application using React Native.

[0693] Users with a device can receive location-based information on nearby traffic conditions and events. For example, peak times for pedestrian traffic in a region on weekends can be predicted, and notifications can be sent to help users choose appropriate modes of transportation. This allows users to avoid congestion in their area and travel efficiently.

[0694] As a concrete example, a prompt message for a generative AI model might be in the format of, "Please provide a summary of the traffic situation in front of XX Station yesterday. What time of day was the most congested?" Such prompt messages allow users to obtain specific analytical results and predictive information based on past data.

[0695] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0696] Step 1:

[0697] The server acquires image data from observation equipment. This includes video from cameras covering a specific area within the region. The server receives this image data as input and uses OpenCV and TensorFlow to identify and count objects within the image. The output is data indicating the number of specific objects.

[0698] Step 2:

[0699] The server collects audio data from the source of the listening. Conversations between users, such as with a taxi driver, are sent to the server as audio data. This audio data is converted to text using the Google Cloud Natural Language API. The converted text data is then used as input for sentiment analysis, and the results are output, categorized as positive, negative, or neutral.

[0700] Step 3:

[0701] The server aggregates historical and current data stored in MongoDB and performs correlation analysis and time series analysis using Python. The input for this analysis is numerical data obtained from image data and sentiment data obtained from text data. The analysis results are used to evaluate the movement of people and the impact of events within a region, and regularities and predictions are output.

[0702] Step 4:

[0703] The smartphone, acting as the terminal, receives real-time analysis results from the server. The user's location information is sent to the server as input, and regional information and congestion predictions are displayed on the user's terminal. The visualized results are provided to the user using an application built with React Native.

[0704] Step 5:

[0705] The user inputs a prompt message into the AI ​​model generated within the app. For example, suppose the user sends the text, "Please tell me about the current traffic conditions." The server receives this prompt message, generates an answer based on past data and analysis results, and outputs it to the user.

[0706] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0707] This invention relates to a system that combines data collection from observation devices, text conversion of audio data, correlation analysis between multiple data sets, data visualization, and strategy proposal with an emotion engine that recognizes user emotions.

[0708] This system is server-centric and collects image data from various locations in the region through observation devices. Users provide the system with audio data they have heard from people such as taxi drivers and restaurant owners, and the server is responsible for converting that audio data into text.

[0709] The server uses image recognition technology to identify specific objects from image data and count their number. The identified data is then used as foundational data for specific calculations of local pedestrian and traffic flow and other related data.

[0710] In addition, the server utilizes an emotion engine to analyze the user's emotions from the converted text data. The emotion engine uses natural language processing techniques to analyze keywords and context within the text and recognize emotional states such as positive, negative, and neutral.

[0711] The server also uses this data to perform correlation analysis between regional datasets. This reveals how specific regional factors influence user sentiment and regional revitalization. The results are displayed visually on the device. Users can use the provided visual dashboard to intuitively understand the situation in their region.

[0712] Furthermore, the system uses time-series analysis to predict future regional trends. The information generated in real time serves as an important tool for users when considering future strategies.

[0713] As a concrete example, suppose a user uses a server to analyze the emotions of young people in a specific region and detects that positive emotions increase on weekends. Based on this, the server can suggest increasing the number of entertainment and cultural events in that region. In this way, the present invention provides a practical system that integrates regional data with user emotion information to support effective regional revitalization strategies.

[0714] The following describes the processing flow.

[0715] Step 1:

[0716] The server collects image data from multiple observation devices within a region. These devices are installed at specific locations and automatically capture images according to a pre-set schedule, then transfer them to the server.

[0717] Step 2:

[0718] Users conduct interviews with taxi drivers and local restaurant owners and send the audio recordings to the server. The server then uses speech recognition technology to convert the received audio data into text.

[0719] Step 3:

[0720] The server applies image recognition algorithms to the image data to automatically count the number of specific objects (e.g., people, cars, bicycles). The results are stored as digital records and used as indicators of activity in the area.

[0721] Step 4:

[0722] The server uses a sentiment engine to analyze text data. Natural language processing techniques are used to analyze keywords and context, classifying the user's sentiment as positive, negative, or neutral.

[0723] Step 5:

[0724] The server performs correlation analysis between the collected datasets. This identifies the relationship between sentiment data and local activities and factors. This analysis is important for extracting specific factors that influence users' sentiment in a given region.

[0725] Step 6:

[0726] The server visualizes the analysis results and displays them on the terminal in real time. The user's terminal can then use the generated dashboard to visually check the local situation and sentiment trends.

[0727] Step 7:

[0728] Based on information obtained from their devices, users consider and implement measures that are appropriate to the characteristics of the region and the emotional state of the users. For example, in a region where positive emotions are prevalent on weekends, they might suggest holding events or gatherings.

[0729] (Example 2)

[0730] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0731] In modern cities and regions, the increasing population density and complex transportation systems make it difficult to formulate appropriate regional revitalization strategies. Furthermore, the scattered nature of various data points leads to a lack of ability to integrate and effectively analyze this data to develop concrete proposals based on regional characteristics and trends. Currently, there is a demand for optimal strategic proposals that take into account temporal trends and people's emotions, but achieving this in real time is challenging.

[0732] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0733] In this invention, the server includes means for analyzing video information collected from observation equipment and counting the number of objects within a predetermined area; means for converting audio information into text information and classifying that text information based on emotional states; and means for analyzing the correlation between multiple sets of information and understanding the relationships between specific factors. This enables the integrated analysis of regional data and the formulation of concrete and effective regional revitalization strategies through real-time emotional analysis and information visualization.

[0734] "Observation equipment" refers to devices used to collect images and videos from various locations within a region.

[0735] "Visual information" refers to visual data captured by observation equipment, and this includes photographs and videos.

[0736] "Counting the quantity of objects" is the process of analyzing video information to identify the number of specific objects or people.

[0737] "Audio information" refers to data obtained from sound, including human conversations and ambient sounds.

[0738] "Converting to text information" is the process of converting audio information into text format.

[0739] "Classifying based on emotional state" is the process of analyzing textual information to identify whether its content corresponds to a positive, negative, or neutral emotion.

[0740] An "information set" refers to a diverse collection of data, which includes both visual and textual information.

[0741] "Analyzing correlations" is the process of analyzing the relationships between sets of information and clarifying the relationships between specific factors.

[0742] A "regional revitalization strategy" refers to specific plans and proposals aimed at regional development and improving the lives of residents.

[0743] In the embodiment for carrying out the invention, a server-centered system is used. This system enables detailed data collection, analysis, visualization, and strategic proposals.

[0744] The server first collects video information from multiple locations in the region using observation equipment. This includes high-resolution cameras and other sensors. The collected video information is processed by image recognition algorithms (e.g., TensorFlow and OpenCV), known as analysis software, to count specific objects. The data obtained from this analysis is used for urban traffic volume analysis and understanding pedestrian flow.

[0745] To further enhance local information, users input audio data acquired from the site into the system. This audio data is converted into text by a server using speech recognition software (e.g., a speech recognition API). This text data is then processed through a sentiment analysis engine and classified based on emotional states. This process utilizes software employing natural language processing techniques (e.g., NLTK or Transformers).

[0746] Furthermore, the server integrates and analyzes these different datasets. Statistical software is used, particularly to analyze correlations and uncover the relationships between specific factors. The results of this analysis are provided through immediate data visualization on the terminal, displayed using dedicated dashboards (e.g., Tableau or Power BI). This makes it easier for users to understand regional trends in real time.

[0747] In future strategic proposals, time-series data analysis models will be used to predict regional changes. This will allow users to obtain foundational information for making strategic decisions in areas such as transportation, urban development, and social activities.

[0748] As a concrete example, when a user uses this system to analyze pedestrian traffic in a commercial district on a holiday, the system can identify congestion on specific streets on weekends based on data trends. As a suggestion, the server can recommend holding pedestrian-priority events in this area. An example of a prompt generated by the AI ​​model is, "Based on past data, identify the main causes of negative sentiment in the area and suggest ways to improve it." In this way, the present invention provides a practical method for revitalizing local communities through complex data analysis.

[0749] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0750] Step 1:

[0751] The server receives video information collected from observation equipment. This video information is collected from various locations in the region. The server receives this information as input, uses image recognition technology to identify specific objects, and counts their quantities. Specifically, the server uses TensorFlow to detect objects such as people and cars, and outputs the results to a database.

[0752] Step 2:

[0753] The user acquires audio information and uploads it to the system. The server receives this audio information as input and converts it into text using speech recognition software. The resulting text information is then used as specific data for sentiment analysis, and the server prepares this as output.

[0754] Step 3:

[0755] The server uses the acquired text information as input to activate its sentiment analysis engine. This engine uses natural language processing techniques to classify emotional states (positive, negative, neutral) from the text information. Specifically, the server extracts emotional keywords, determines the emotional category based on these keywords, and outputs the result.

[0756] Step 4:

[0757] The server integrates all datasets and analyzes their correlations. The input consists of multiple datasets, and the output shows the relationships between specific factors. The server uses statistical methods to analyze this and reveal specific patterns and trends.

[0758] Step 5:

[0759] The server uses the analysis results to generate a dataset for visualization on the terminal and outputs it to a visualization tool. Specifically, the server provides data to Tableau or Power BI, and displays the results visually using graphs and charts.

[0760] Step 6:

[0761] The server performs time series analysis to predict future regional trends. This uses historical data as input and generates predictions of future trends as output. Specifically, the server applies a time series model to create data showing, for example, the expected increase or decrease in traffic volume in the following month.

[0762] Step 7:

[0763] The server uses the generated information to construct specific strategies in order to propose regional development plans to the user. In this process, the generating AI model outputs strategic proposals based on prompts. An example of a prompt is, "Predict the expected emotional trends of people at the next event and propose the optimal marketing strategy."

[0764] (Application Example 2)

[0765] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0766] In modern urban environments, it is essential to understand residents' sentiments and local trends in real time and develop effective strategies based on that understanding. However, traditional methods lacked systems capable of instantly analyzing vast amounts of data and presenting the results visually, making rapid decision-making difficult. In particular, proposals for regional revitalization were often not intuitive, requiring users to understand their importance and take action.

[0767] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0768] In this invention, the server includes means for analyzing image information collected from observation equipment to count the number of specific objects within a region, means for converting audio information obtained from listening into text information and classifying that text information based on emotion, and means for using a computer system that provides an intuitive interface for users to understand proposed policies and take new actions. This enables real-time understanding of the local situation and quick, clear decision-making.

[0769] "Observation equipment" refers to devices that collect image information from various locations in a region.

[0770] "Analysis" is the process of examining collected information and extracting specific patterns or meaningful data.

[0771] "Object" refers to an object or event that is specifically identified as the subject of observation.

[0772] "Audio information" refers to data collected from the voices of one or more people.

[0773] "Textual information" refers to data that represents audio information as text.

[0774] "Emotion-based classification" is a process that uses natural language processing techniques to analyze text data and categorize it into emotional states such as positive, negative, and neutral.

[0775] An "interface" refers to the display screen or input method that a user uses to interact with a computer system.

[0776] A "computer system" is a system consisting of a series of devices and software that process and analyze data.

[0777] "Immediate" refers to a situation where there is virtually no time required from information collection to display.

[0778] "Visualization" refers to displaying data as graphs or diagrams to make information easier to understand intuitively.

[0779] A "regional revitalization policy" is a plan or strategy that outlines how to make a specific region more vibrant.

[0780] "Users" refers to those who operate this system and receive data and suggestion information.

[0781] The server is configured as follows to implement this invention. First, multiple cameras and sensors are installed in the area as observation equipment to collect image data in real time. This data is analyzed using the open-source image processing library OpenCV to detect specific objects and count their numbers. The results of this image analysis are used to understand the flow of people and traffic volume within the area.

[0782] Simultaneously, the audio information collected from the user is converted into text using a cloud-based speech recognition API. The converted text is then subjected to sentiment analysis using a natural language processing library (such as NLTK). This sentiment analysis determines whether the user's statements are positive, negative, or neutral.

[0783] The server performs correlation analysis between these multiple datasets to identify specific causal relationships. The results of this analysis are visualized on the computer system in a way that is intuitively understandable to users and displayed on the user's terminal in real time. Based on this information, users can consider regional revitalization strategies. Users are provided with an intuitive interface and assistance in easily formulating new action plans.

[0784] As a concrete example, this system could be used during a music event in a certain city to detect positive emotional tendencies among a large number of residents, and this information could be used to propose strategies for further improving the event the following year.

[0785] An example of a prompt using a generative AI model is: "Analyze the impact of local music events on residents' emotions and generate specific strategic proposals."

[0786] In this way, the system enables efficient analysis of complex data and provides users with effective information for regional revitalization.

[0787] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0788] Step 1:

[0789] The server collects regional image data in real time from observation equipment. The input is raw image data from cameras and sensors, and the output is image data converted into an analyzable format. To achieve this, image resolution and format conversion are performed in the initial stages.

[0790] Step 2:

[0791] The server uses OpenCV to analyze the collected image data, detect specific objects, and count their numbers. The input is the transformed image data obtained in step 1, and the output is data about the type and number of objects. Specifically, it applies an object detection algorithm and updates the database with the recognized objects.

[0792] Step 3:

[0793] The server converts the voice information provided by the user into text using a speech recognition API. The input is voice data, and the output is the converted string data. Specifically, the speech recognition API analyzes the frequency components of the voice and generates the corresponding string.

[0794] Step 4:

[0795] The server performs sentiment analysis on text data using natural language processing techniques. The input is the text data generated in step 3, and the output is an emotional state such as positive, negative, or neutral. In this step, keywords within the text are analyzed and an emotional score is calculated.

[0796] Step 5:

[0797] The server investigates the correlation between image analysis data and sentiment data from text, and identifies the relationships. The input is the dataset obtained in steps 2 and 4, and the output is the analysis results showing causal relationships. Specifically, a statistical model is used to calculate the correlation between the data.

[0798] Step 6:

[0799] The terminal visualizes the analysis results sent from the server and displays them to the user. The input is the analysis data obtained in step 5, and the output is a graphical display on the user interface. In this step, graphs and dashboards are generated based on the data and presented in a way that the user can intuitively understand.

[0800] Step 7:

[0801] Users formulate and implement regional revitalization policies through the provided interface. The input is visualized analysis results, and the output is a new action plan. In the final step, the system supports users in navigating the interface, formulating the new plan, and putting it into practice.

[0802] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0803] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0804] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0805] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0806] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0807] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0808] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0809] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0810] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0811] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0812] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0813] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0814] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0815] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0816] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0817] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0818] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0819] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0820] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0821] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0822] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0823] The following is further disclosed regarding the embodiments described above.

[0824] (Claim 1)

[0825] A means of analyzing image data collected from observation equipment to count the number of specific objects within a region,

[0826] A method for converting audio data obtained from listening into text data, and for classifying that text data based on emotion,

[0827] A means of performing correlation analysis across multiple datasets to identify relationships between specific factors,

[0828] A means of visualizing data and displaying results in real time,

[0829] A means of proposing regional revitalization strategies to users,

[0830] A system that includes this.

[0831] (Claim 2)

[0832] The system according to claim 1, further comprising means for predicting future regional trends by performing time-series analysis based on collected data.

[0833] (Claim 3)

[0834] The system according to claim 1, further comprising data preprocessing means for removing noise from data and extracting necessary information.

[0835] "Example 1"

[0836] (Claim 1)

[0837] A means of analyzing image information collected from observation equipment to measure the number of specific objects within a region,

[0838] A means of converting acquired audio information into text information and classifying that text information based on emotion,

[0839] A means of performing correlation analysis across multiple sets of information to identify relationships between specific elements,

[0840] Means for visualizing information and displaying results in real time,

[0841] A means of proposing regional revitalization policies to users,

[0842] A method for generating prompt sentences and utilizing a generative AI model,

[0843] A system that includes this.

[0844] (Claim 2)

[0845] The system according to claim 1, further comprising means for performing time-series analysis based on collected information to predict future regional trends.

[0846] (Claim 3)

[0847] The system according to claim 1, further comprising information preprocessing means for removing noise from information and extracting necessary information.

[0848] "Application Example 1"

[0849] (Claim 1)

[0850] A means of analyzing image data collected from observation equipment to count the number of specific objects within a region,

[0851] A method for converting audio data obtained from listening into text data, and for classifying that text data based on emotion,

[0852] A means of performing correlation analysis across multiple datasets to identify relationships between specific factors,

[0853] A means of visualizing data and displaying results in real time,

[0854] A means of providing local information in real time based on the user's location information,

[0855] A means of proposing regional revitalization strategies to users,

[0856] A system that includes this.

[0857] (Claim 2)

[0858] The system according to claim 1, further comprising means for performing time-series analysis based on collected data and historical data to predict future regional trends and congestion levels.

[0859] (Claim 3)

[0860] The system according to claim 1, comprising data preprocessing means for removing noise from data and extracting necessary information, and providing customized notifications to the user.

[0861] "Example 2 of combining an emotion engine"

[0862] (Claim 1)

[0863] A means for analyzing video information collected from observation equipment and counting the number of objects within a predetermined area,

[0864] A means of converting audio information into text information and classifying that text information based on emotional state,

[0865] A means of analyzing the correlation between multiple sets of information and understanding the relationships between specific factors,

[0866] Means for immediately visualizing and displaying results,

[0867] A means of proposing regional development plans to users,

[0868] A means of analyzing user emotions using text data and identifying emotional states,

[0869] A system that includes this.

[0870] (Claim 2)

[0871] The system according to claim 1, further comprising means for performing time series data analysis and predicting future regional trends.

[0872] (Claim 3)

[0873] The system according to claim 1, further comprising data preprocessing means for removing unnecessary elements from data and extracting necessary information.

[0874] "Application example 2 when combining with an emotional engine"

[0875] (Claim 1)

[0876] A means for analyzing image information collected from observation equipment to count the number of specific objects within a region,

[0877] A means of converting auditory information obtained from listening into text information, and classifying that text information based on emotion,

[0878] A means of performing correlation analysis across multiple sets of information to identify relationships between specific factors,

[0879] Means for immediately visualizing information and displaying results,

[0880] A means of proposing a strategy for revitalizing the domain to users,

[0881] A means of using a computer system that provides an intuitive interface for users to understand proposed policies and take new actions,

[0882] A system that includes this.

[0883] (Claim 2)

[0884] The system according to claim 1, further comprising means for predicting future regional trends by performing time-series analysis based on collected information.

[0885] (Claim 3)

[0886] The system according to claim 1, further comprising information preprocessing means for removing information noise and extracting necessary information. [Explanation of Symbols]

[0887] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of analyzing image data collected from observation equipment to count the number of specific objects within a region, A method for converting audio data obtained from listening into text data, and for classifying that text data based on emotion, A means of performing correlation analysis across multiple datasets to identify relationships between specific factors, A means of visualizing data and displaying results in real time, A means of providing local information in real time based on the user's location information, A means of proposing regional revitalization strategies to users, A system that includes this.

2. The system according to claim 1, further comprising means for performing time-series analysis based on collected data and historical data to predict future regional trends and congestion levels.

3. The system according to claim 1, comprising data preprocessing means for removing noise from data and extracting necessary information, and providing customized notifications to the user.