Image recognition-based traditional village scene English display method and system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring village scene data through image recognition technology, generating personalized English display text, and adjusting display parameters, the problems of information errors and insufficient interactivity in traditional village scene displays are solved, and a user-friendly personalized display experience is achieved.

CN122197825APending Publication Date: 2026-06-12ANHUI OCCUPATIONAL COLLEGE OF CITY MANAGEMENT

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ANHUI OCCUPATIONAL COLLEGE OF CITY MANAGEMENT
Filing Date: 2026-02-10
Publication Date: 2026-06-12

Application Information

Patent Timeline

10 Feb 2026

Application

12 Jun 2026

Publication

CN122197825A

IPC: G06F40/151; G06T11/60; G06V10/77; G06V20/10; G06V10/764; G06F40/30; G06V20/70; G06F40/253

AI Tagging

Application Domain

Semantic analysis Character and pattern recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Traditional methods of showcasing village scenes rely on manual writing, which results in information errors, a lack of interactivity, an inability to be personalized, and an inability to respond to environmental changes in real time, leading to a poor user experience.

⚗Method used

By acquiring village scene data through image recognition technology, extracting target element features, generating English display text, adjusting display parameters based on environmental data, providing user selection and voice command interaction, and generating personalized display paths.

🎯Benefits of technology

Ensure that the displayed information is accurate and comprehensive, enhance user engagement and satisfaction, provide convenient interaction and a coherent display experience, and adapt to different environmental changes.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122197825A_ABST

Patent Text Reader

Abstract

The present application relates to the technical field of scene display, in particular to a traditional village scene English display method and system based on image recognition, comprising: obtaining image data of a traditional village scene, and performing feature extraction on the image data to identify target elements in the scene, the target elements including at least one of traditional buildings, folk items and natural landscapes; generating corresponding English display text according to attribute information of the target elements, the attribute information including the name, historical era and cultural implications of the target elements; and judging whether the English display text meets preset display conditions, the preset display conditions including text accuracy, length range and language difficulty. The present application extracts feature information of target elements and generates corresponding English display text by obtaining image data of a traditional village, ensuring that the displayed information is accurate and rich, and effectively conveying traditional culture and historical background.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of scene display technology, specifically to a method and system for displaying traditional village scenes in English based on image recognition. Background Technology

[0002] Currently, traditional methods often rely on manually written presentation texts, which are prone to errors or omissions and cannot fully and accurately convey traditional culture and historical background. Moreover, they are often one-way information transmissions, where users can only passively receive information, lacking interactivity and unable to be adjusted according to user needs, resulting in low user engagement.

[0003] Furthermore, traditional methods lack flexibility in generating presentation text, often using fixed and standardized content that is difficult to adapt to the needs of different audiences, lacks personalized experience, and requires a large amount of human resources for content creation and review, resulting in slow content updates and an inability to reflect the latest cultural trends and research findings in a timely manner. In addition, traditional presentation methods cannot adjust the presentation content or format in real time according to environmental changes, resulting in a poor user experience in different environments. Summary of the Invention

[0004] To achieve the above objectives, the present invention provides the following technical solution: a method for displaying traditional village scenes in English based on image recognition, comprising:

[0005] Acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene. Target elements include at least one of traditional buildings, folk items, and natural landscapes.

[0006] Based on the attribute information of the target element, generate the corresponding English display text. The attribute information includes the name, historical period, and cultural connotation of the target element.

[0007] Determine whether the English text being displayed meets the preset display conditions, which include text accuracy, length range, and language difficulty.

[0008] If the English display text meets the preset display conditions, the English display text will be output to the display device.

[0009] In response to the English display text not meeting the preset display conditions, alternative attribute information is extracted from the adjacent elements associated with the target element according to the type of defect of the English display text, and at least one alternative English display text that meets the display conditions is generated for the user to select and / or the user to adjust the content of the alternative English display text through voice commands.

[0010] Acquire real-time environmental data of the target element, including light intensity and background noise; adjust the display parameters of the English display text in the target English display path based on the real-time environmental data, including font size and speech rate; output the adjusted English display text to the display device.

[0011] Preferably, image data of traditional village scenes is acquired, and feature extraction is performed on the image data to identify target elements in the scene, including:

[0012] Obtain the geographical location information of traditional villages, and determine the target village area to be displayed based on the geographical location information;

[0013] Based on the scope of the target village area, control the image acquisition equipment to move to the preset shooting point in the target village area;

[0014] Multi-angle image data of the target village area is acquired through image acquisition equipment, and the multi-angle image data is associated with and stored with geographical location information;

[0015] The feature vectors of image data are obtained, and the feature vectors are processed by a pre-trained object detection model to identify candidate regions in the image.

[0016] The candidate regions are compared with the traditional village element templates stored in the database to determine the target element category corresponding to the candidate regions.

[0017] Based on the target element category, extract the contour information and texture features of the target elements in the image.

[0018] Preferably, based on the attribute information of the target element, the corresponding English display text is generated, including:

[0019] Obtain the historical and cultural attributes of the target element, including the construction date, related folk tales, and traditional crafts.

[0020] Historical and cultural attributes are converted into structured data, and then the structured data is converted into English display text through a natural language generation model;

[0021] Perform grammar checks on the English presentation text to ensure it conforms to English expression habits;

[0022] Extract the architectural style, historical period, and cultural connotation of scene elements in each display area as scene element feature information;

[0023] Retrieve several candidate text fragments that match the feature information of scene elements from a pre-set English display corpus;

[0024] Based on the semantic coherence of the candidate corpus fragments and the preset text length limit, the candidate corpus fragments are spliced and grammatically corrected.

[0025] The spliced and corrected text will be used as the corresponding English display text for the English display unit.

[0026] Preferably, determining whether the English displayed text meets the preset display conditions includes:

[0027] Obtain semantic information from the English display text and compare the semantic information with the attribute information of the target element;

[0028] If the matching degree between semantic information and attribute information is lower than the preset threshold, the English display text is determined not to meet the accuracy condition.

[0029] If the matching degree is higher than or equal to the preset threshold, the length of the English display text is further checked to see if it is within the preset character range;

[0030] If the length of the displayed English text exceeds the preset character range, then the displayed English text is determined not to meet the length range condition.

[0031] If the language difficulty coefficient of the English display text is higher than the preset level, then the English display text is determined not to meet the language difficulty condition.

[0032] Preferably, based on the defect type of the English display text, alternative attribute information is extracted from the adjacent elements associated with the target element to generate at least one alternative English display text that meets the display conditions, including:

[0033] If the defect type of the English text is incomplete content, then obtain the attribute information of the adjacent related elements of the target element. The related elements include the auxiliary components of the same building and the surrounding folk items.

[0034] Based on the attribute information of the associated elements, fill in the missing content of the English display text and generate alternative English display text;

[0035] Obtain the number of associated elements for each candidate English display text and evaluate the relevance of the candidate English display text to the target element;

[0036] Prioritize each candidate English presentation text based on the number and relevance of associated elements;

[0037] Based on priority, all candidate English presentation texts are sorted and numbered from highest to lowest priority.

[0038] Preferably, the content of the alternative English display text is available for user selection and / or user-guided adjustment via voice commands, including:

[0039] Based on the sort number, the user selects one of the alternative English texts as the final text to be displayed by voice command and / or preset gestures.

[0040] Alternatively, the user can select one of the alternative English texts to be displayed via voice commands and / or preset gestures, and then modify the content of the selected alternative English texts by adjusting the commands, with the modified alternative English texts being used as the final displayed texts;

[0041] Obtain the latitude and longitude coordinates and category attribute of the target element corresponding to the final displayed text;

[0042] Using a preset clustering radius and minimum number of elements as conditions, a spatial clustering algorithm is used to perform clustering analysis on the latitude and longitude coordinates of the target elements to obtain several clusters.

[0043] Preferably, the method further includes:

[0044] Calculate the scene element density and category richness within each cluster;

[0045] Based on scene element density, category richness, and preset display unit area threshold, adjacent clusters that meet the density requirements are merged into several English display units.

[0046] Construct a semantic association network among English display units, where nodes represent English display units and edges represent the semantic similarity between English display texts;

[0047] The English display unit, which is located in the central area and covers the most scene elements, will be used as the starting point for the current display.

[0048] Based on the spatial location information of several English display units and the semantic relationship of the English display text, a path planning algorithm is used to generate an English display tour path, starting from the display starting point with the largest coverage of the current English display unit.

[0049] Preferably, based on the spatial location information of several English display units and the semantic relationship between the English display texts, a path planning algorithm is used to generate an English display tour path, starting from the display starting point with the largest coverage area of the current English display unit, including:

[0050] Using the current English display unit as the current node, query all candidate nodes connected to the current node in the semantic association network;

[0051] Calculate the ratio of the semantic similarity weight between each candidate node and the current node to the spatial distance;

[0052] Select the candidate node with the largest ratio as the next English display unit, and add the cumulative tour distance to the distance from the current node to the next node;

[0053] If the cumulative tour distance does not exceed the preset tour distance constraint, repeat the above steps until all English display units are covered;

[0054] Connect the center points of several selected English display units in sequence to generate an English display tour path.

[0055] Preferably, the display parameters of the English text in the target English display path are adjusted based on real-time environmental data, including:

[0056] Obtain the current light intensity data and background noise level in decibels of the display device;

[0057] If the light intensity is lower than the preset illuminance threshold, the font size of the English text will be increased.

[0058] If the background noise level is higher than the preset noise threshold, the playback speed and volume of the English text will be increased.

[0059] Synchronize the adjusted font display size and voice playback parameters to all display nodes in the target English display path;

[0060] Output the adjusted English presentation text to the presentation device.

[0061] A system for displaying English in traditional village scenes based on image recognition, applicable to the aforementioned method for displaying English in traditional village scenes based on image recognition, includes:

[0062] The feature extraction module is used to acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene. The target elements include at least one of traditional buildings, folk items, and natural landscapes.

[0063] The text generation module is used to generate corresponding English display text based on the attribute information of the target element. The attribute information includes the name, historical period, and cultural connotation of the target element.

[0064] The display judgment module is used to determine whether the English display text meets the preset display conditions, which include text accuracy, length range, and language difficulty.

[0065] The English display module is used to output the English display text to the display device when the English display text meets the preset display conditions.

[0066] The first adjustment module is used to respond to the fact that the English display text does not meet the preset display conditions. Based on the type of defect of the English display text, it extracts alternative attribute information from the adjacent elements associated with the target element and generates at least one alternative English display text that meets the display conditions, so that the user can select and / or the user can adjust the content of the alternative English display text through voice commands.

[0067] The second adjustment module is used to acquire real-time environmental data of the target element, including light intensity and background noise; adjust the display parameters of the English display text in the target English display path according to the real-time environmental data, including font size and speech rate; and output the adjusted English display text to the display device.

[0068] Compared with the prior art, the beneficial effects of the present invention are:

[0069] (1) This invention obtains image data of traditional villages, extracts feature information of target elements and generates corresponding English display text, ensuring that the displayed information is accurate and rich, effectively conveying traditional culture and historical background. Moreover, it dynamically adjusts the display parameters of the display text based on user needs and real-time environmental data, so that users can have a good visual and auditory experience in different environments, thereby enhancing user participation and satisfaction.

[0070] (2) This invention transforms structured data into fluent English display text through a natural language generation model and performs grammatical verification, reducing human intervention and improving the efficiency and quality of content generation. Moreover, it provides a more intuitive and convenient interaction method by selecting or adjusting the display text through voice commands or gestures, enhancing the user's initiative and sense of participation. Furthermore, by generating an efficient display tour path, it ensures that users can appreciate each display unit in the optimal order during the visit, improving the continuity and fluency of the display activities. Attached Figure Description

[0071] Figure 1 This is a schematic flowchart of the overall method in one embodiment of the present invention;

[0072] Figure 2 This is a schematic diagram of the overall system architecture in one embodiment of the present invention.

[0073] In the diagram: 1. Feature extraction module; 2. Text generation module; 3. Display judgment module; 4. English display module; 5. First adjustment module; 6. Second adjustment module. Detailed Implementation

[0074] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0075] Example 1, please refer to Figure 1This invention provides a technical solution: a method for displaying traditional village scenes in English based on image recognition, comprising:

[0076] S1. Acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene. Target elements include at least one of traditional buildings, folk items, and natural landscapes.

[0077] S2. Generate the corresponding English display text based on the attribute information of the target element. The attribute information includes the name, historical period, and cultural connotation of the target element.

[0078] S3. Determine whether the English text displayed meets the preset display conditions, which include text accuracy, length range, and language difficulty.

[0079] S4. If the English display text meets the preset display conditions, the English display text will be output to the display device.

[0080] S5. In response to the English display text not meeting the preset display conditions, alternative attribute information is extracted from the adjacent elements associated with the target element according to the defect type of the English display text, and at least one alternative English display text that meets the display conditions is generated for the user to select and / or the user to adjust the content of the alternative English display text through voice commands.

[0081] S6. Obtain real-time environmental data of the target element, including light intensity and background noise; adjust the display parameters of the English display text in the target English display path according to the real-time environmental data, including font size and speech rate; output the adjusted English display text to the display device.

[0082] It should be noted that image data is obtained from traditional villages; these images may include elements such as architecture, folk artifacts, and natural landscapes; image processing techniques are used to extract features from the images in order to identify specific target elements;

[0083] Based on the identified attribute information of the target elements (such as name, historical period, and cultural connotation), corresponding English display text is generated; this step ensures that the display content has cultural and historical context.

[0084] The generated English presentation text is evaluated to check whether its accuracy, length, and language difficulty meet preset standards; this ensures the quality and appropriateness of the presentation content.

[0085] If the displayed text meets the criteria, it is output to the display device for the user to view; if it does not meet the criteria, alternative information is searched from adjacent elements related to the target element according to the specific question type, and at least one alternative English displayed text that meets the requirements is generated; this allows the user to make a selection or adjust the text content through voice commands.

[0086] Real-time monitoring of environmental data, such as light intensity and background noise, allows for appropriate adjustments during display. Based on the collected environmental data, the display parameters of the text, such as font size and speech rate, are adjusted to better suit the current environment and improve the user experience.

[0087] In an optional embodiment, image data of a traditional village scene is acquired, and feature extraction is performed on the image data to identify target elements in the scene, including:

[0088] Obtain the geographical location information of traditional villages, and determine the target village area to be displayed based on the geographical location information;

[0089] Based on the scope of the target village area, control the image acquisition equipment to move to the preset shooting point in the target village area;

[0090] Multi-angle image data of the target village area is acquired through image acquisition equipment, and the multi-angle image data is associated with and stored with geographical location information;

[0091] The feature vectors of image data are obtained, and the feature vectors are processed by a pre-trained object detection model to identify candidate regions in the image.

[0092] The candidate regions are compared with the traditional village element templates stored in the database to determine the target element category corresponding to the candidate regions;

[0093] Based on the target element category, extract the contour information and texture features of the target elements in the image.

[0094] It should be noted that obtaining the geographical location information of traditional villages is crucial; this usually includes the village's coordinates (longitude and latitude).

[0095] Based on the acquired geographic location information, the system determines the target village area to be displayed; this means that the system will identify the specific range of the village in order to collect images in a concentrated manner.

[0096] Once the target village area is identified, the system will control the image acquisition equipment (such as a camera or drone) to move to the preset shooting points; these shooting points are selected based on the best viewing angle and shooting conditions to ensure that high-quality image data is obtained.

[0097] In the target village area, the image acquisition equipment will acquire image data from multiple angles; this multi-angle acquisition helps to comprehensively showcase the characteristics and details of the village.

[0098] The acquired multi-angle image data is associated with and stored in conjunction with previously collected geographic location information; this association allows each image to be traced back to its specific location; feature vectors are extracted from the image data; a feature vector is a digital representation that can capture key features of the image, such as color, shape, and structure.

[0099] By using a pre-trained object detection model, the system processes the extracted feature vectors; this step aims to identify candidate regions in the image, i.e. regions that may contain target elements.

[0100] The identified candidate regions are compared with pre-stored traditional village element templates in the database; the purpose of this process is to determine the target element category corresponding to the candidate regions, such as traditional buildings, folk items, or natural landscapes.

[0101] Based on the identified target element category, the system will further extract the contour information and texture features of the target elements in the image; this information is very important for subsequent display and analysis, and can help create more accurate content display.

[0102] In an optional embodiment, generating corresponding English display text based on the attribute information of the target element includes:

[0103] Obtain the historical and cultural attributes of the target element, including the construction date, related folk tales, and traditional crafts.

[0104] Historical and cultural attributes are converted into structured data, and then the structured data is converted into English display text through a natural language generation model;

[0105] Perform grammar checks on the English presentation text to ensure it conforms to English expression habits;

[0106] Extract the architectural style, historical period, and cultural connotation of scene elements in each display area as scene element feature information;

[0107] Retrieve several candidate text fragments that match the feature information of scene elements from a pre-set English display corpus;

[0108] Based on the semantic coherence of the candidate corpus fragments and the preset text length limit, the candidate corpus fragments are spliced and grammatically corrected.

[0109] The spliced and corrected text will be used as the corresponding English display text for the English display unit.

[0110] It should be noted that historical and cultural attributes related to the target element should be collected; these attributes include the construction period (e.g., when the building was built), related folk tales (traditional stories or legends related to the element), and traditional crafts (e.g., the traditional techniques used to make the element).

[0111] The acquired historical and cultural attributes are transformed into structured data; this means organizing this information in a processable format for later use.

[0112] Using a Natural Language Generation (NLG) model, structured data is converted into English display text; the NLG model can generate fluent, grammatically correct text based on the input data;

[0113] The generated English presentation text undergoes grammar verification to ensure it conforms to English expression habits; this step is crucial as it helps improve the text's readability and professionalism.

[0114] Architectural style, historical period, and cultural connotations are extracted from scene elements in each exhibition area; this information will be used to enrich the text content, making it more in-depth and diverse.

[0115] In a pre-set English presentation corpus, several candidate corpus fragments that match the scene element feature information are retrieved; these fragments can be sentences or paragraphs related to a specific theme or element.

[0116] Based on the semantic coherence of the candidate corpus fragments and the pre-set text length limit, the candidate fragments are spliced and grammatically corrected; this process ensures that the final generated text is not only rich in information, but also logically coherent and meets the length requirements.

[0117] The spliced and corrected text will be used as the final display text for the English display unit; this text can be used on the display device to convey important information about the target element to the user.

[0118] In an optional embodiment, determining whether the English displayed text meets preset display conditions includes:

[0119] Obtain semantic information from the English display text and compare the semantic information with the attribute information of the target element;

[0120] If the matching degree between semantic information and attribute information is lower than the preset threshold, the English display text is determined not to meet the accuracy condition.

[0121] If the matching degree is higher than or equal to the preset threshold, the length of the English display text is further checked to see if it is within the preset character range;

[0122] If the length of the displayed English text exceeds the preset character range, then the displayed English text is determined not to meet the length range condition.

[0123] If the language difficulty coefficient of the English display text is higher than the preset level, then the English display text is determined not to meet the language difficulty condition.

[0124] It should be noted that semantic information is extracted from the generated English presentation text; this includes the main ideas, themes, and key information conveyed by the text.

[0125] The extracted semantic information is compared with the attribute information of the target element; the attribute information of the target element includes its historical and cultural background, construction period, and other characteristics; the purpose of this comparison is to ensure that the displayed text accurately reflects the true information of the target element.

[0126] If the matching degree between semantic information and attribute information is lower than a preset threshold (i.e., the consistency between the two is very low), the system will determine that the English display text does not meet the accuracy condition; this indicates that the text has failed to effectively convey the core information of the target element.

[0127] If the semantic matching degree is higher than or equal to the preset threshold, the system will further check whether the length of the English display text is within the preset character range; this process is to ensure that the text is not too long or too short in order to meet the display requirements.

[0128] If the length of the displayed text exceeds the preset character range, the system will determine that the text does not meet the length range condition; this may affect the user's reading experience, as overly long text may lead to information redundancy, while overly short text may not be able to fully express the information;

[0129] The system assesses the language difficulty level of the English text being presented. This level can be calculated based on factors such as vocabulary complexity and sentence structure. If the level is higher than the preset level (i.e., the text is too complex or difficult to understand), the system will determine that the text does not meet the language difficulty criteria.

[0130] In an optional embodiment, based on the defect type of the English display text, alternative attribute information is extracted from adjacent elements associated with the target element to generate at least one alternative English display text that meets the display conditions, including:

[0131] If the defect type of the English text is incomplete content, then obtain the attribute information of the adjacent related elements of the target element. The related elements include the auxiliary components of the same building and the surrounding folk items.

[0132] Based on the attribute information of the associated elements, fill in the missing content of the English display text and generate alternative English display text;

[0133] Obtain the number of associated elements for each candidate English display text and evaluate the relevance of the candidate English display text to the target element;

[0134] Prioritize each candidate English presentation text based on the number and relevance of associated elements;

[0135] Based on priority, all candidate English presentation texts are sorted and numbered from highest to lowest priority.

[0136] It should be noted that determining the type of defect in the currently displayed English text is crucial; for example, if the text is deemed incomplete, further steps are needed to supplement the information.

[0137] When the defect type is incomplete content, the system will obtain the attribute information of the adjacent related elements of the target element; these related elements may include the auxiliary components of the same building (such as doors, windows, carvings, etc.) and the surrounding folk items (such as local handicrafts, traditional decorations, etc.).

[0138] Using the extracted attribute information of related elements, the system supplements the original English display text to generate at least one new alternative English display text; this process ensures that the new text can cover important information that was not mentioned before, thus making the content more complete.

[0139] The number of associated elements for each generated candidate English display text will be calculated, and the relevance of these candidate texts to the target element will be evaluated; this evaluation is to ensure that the newly generated text maintains a certain connection and consistency with the target element.

[0140] The system will evaluate the priority of each candidate English display text based on the number of related elements and their relevance to the target element; generally, texts containing more related elements and highly relevant to the target element will be considered as higher priority selections.

[0141] All generated alternative English display texts are sorted and numbered according to priority; texts with higher priority are listed first, while texts with lower priority are listed later; in this way, the system can provide the most suitable alternative texts for users or displays.

[0142] In an optional embodiment, allowing the user to select and / or adjust the content of alternative English display text via voice commands includes:

[0143] Based on the sort number, the user selects one of the alternative English texts as the final text to be displayed by voice command and / or preset gestures.

[0144] Alternatively, the user can select one of the alternative English texts to be displayed via voice commands and / or preset gestures, and then modify the content of the selected alternative English texts by adjusting the commands, with the modified alternative English texts being used as the final displayed texts;

[0145] Obtain the latitude and longitude coordinates and category attribute of the target element corresponding to the final displayed text;

[0146] Using a preset clustering radius and minimum number of elements as conditions, a spatial clustering algorithm is used to perform clustering analysis on the latitude and longitude coordinates of the target elements to obtain several clusters.

[0147] It should be noted that, based on the previously generated sequence number of the alternative English texts, users can select one of the alternative English texts as the final text to be displayed using voice commands or preset gestures; this process allows users to participate in the content selection process in a more flexible and convenient way.

[0148] After selecting an alternative English text for display, users can adjust and modify the content using voice commands or a preset gesture guidance system; this means users can personalize the text to better suit their needs or preferences; the modified text will be considered the final displayed text.

[0149] Once the final display text is determined, extract the latitude and longitude coordinates and category attributes of the corresponding target element (e.g., the element is a building, monument, natural landscape, etc.); this information provides the basic data for subsequent spatial analysis.

[0150] Based on the latitude and longitude coordinates of the target elements, a spatial clustering algorithm is applied for analysis using preset clustering radius and minimum element count conditions. The clustering radius defines the range within which the proximity relationships between elements are considered, while the minimum element count ensures that at least a certain number of elements exist in each cluster.

[0151] Through cluster analysis, the system groups the latitude and longitude coordinates of the target elements into several clusters. Each cluster represents a group of geographically close elements. These clusters can help users understand the distribution of related elements in a certain area and may be used for subsequent display or analysis.

[0152] In an optional embodiment, the method further includes:

[0153] Calculate the scene element density and category richness within each cluster;

[0154] Based on scene element density, category richness, and preset display unit area threshold, adjacent clusters that meet the density requirements are merged into several English display units.

[0155] Construct a semantic association network among English display units, where nodes represent English display units and edges represent the semantic similarity between English display texts;

[0156] The English display unit, which is located in the central area and covers the most scene elements, will be used as the starting point for the current display.

[0157] Based on the spatial location information of several English display units and the semantic relationship of the English display text, a path planning algorithm is used to generate an English display tour path, starting from the display starting point with the largest coverage of the current English display unit.

[0158] It should be noted that after performing cluster analysis, the system will calculate the scene element density (i.e., the number of elements contained in a unit area) and category richness (i.e., the number of elements of different categories) within each cluster; this step helps to assess the richness of cultural and historical information in each cluster.

[0159] Based on the calculated scene element density, category richness, and preset display unit area threshold, the system will merge adjacent clusters that meet the density requirements into several English display units; this process ensures the continuity of content and the aggregation of information in the display units, thereby improving the display effect.

[0160] A semantic association network will be constructed, where each node represents an English display unit and each edge represents the semantic similarity between these display units; this network structure can help understand the relationships between different display units and provide a basis for subsequent tour path planning.

[0161] Among all the generated English display units, the system will select the display unit located in the central area and covering the most scene elements as the starting point for the current display; this selection method ensures that users start their tour from the area with the most information, improving the effectiveness of the display.

[0162] Based on the spatial location information and semantic relationships of the English display units, the system will use a path planning algorithm to generate a reasonable English display tour path starting from the current display unit (with the largest coverage). This path planning takes into account the distance between display units, semantic relationships, and the user's tour experience.

[0163] In an optional embodiment, based on the spatial location information of several English display units and the semantic relationship between the English display text, a path planning algorithm is used to generate an English display tour path, starting from the display starting point with the largest coverage area of the current English display unit, including:

[0164] Using the current English display unit as the current node, query all candidate nodes connected to the current node in the semantic association network;

[0165] Calculate the ratio of the semantic similarity weight between each candidate node and the current node to the spatial distance;

[0166] Select the candidate node with the largest ratio as the next English display unit, and add the cumulative tour distance to the distance from the current node to the next node;

[0167] If the cumulative tour distance does not exceed the preset tour distance constraint, repeat the above steps until all English display units are covered;

[0168] Connect the center points of several selected English display units in sequence to generate an English display tour path.

[0169] It should be noted that, starting from the current English display unit (current node), the system will query all candidate nodes connected to the current node in the semantic association network; these candidate nodes represent other display units associated with the current display unit, and they may provide a wealth of relevant information;

[0170] For each candidate node, the system calculates its semantic similarity weight (representing their similarity in topic or content) and spatial distance (i.e. physical distance) with the current node; then, it calculates the ratio of these two values; this ratio is used to evaluate the priority of selecting the next display unit: a higher ratio means that the candidate node is more semantically relevant to the current node and is also relatively closer in distance;

[0171] The candidate node with the largest ratio will be selected as the next English display unit, and the cumulative tour distance will be updated. This selection ensures that users can obtain the most relevant information during the tour, while reducing the distance traveled and improving efficiency.

[0172] After updating the cumulative tour distance, the system will check whether this cumulative distance exceeds the preset tour distance constraint. If it does not exceed the limit, the system will repeat the above steps, that is, continue to start from the new current node, query new candidate nodes and calculate the ratio until the goal of covering all English display units is achieved or the preset tour distance is exceeded.

[0173] Once the desired English-language display units are identified, the system will connect the center points of these units in sequence to generate the final English-language tour path. This path will take into full account semantic relevance and spatial distance to ensure that users have a reasonable and informative tour experience.

[0174] In an optional embodiment, adjusting the display parameters of the English text displayed in the target English display path based on real-time environmental data includes:

[0175] Obtain the current light intensity data and background noise level in decibels of the display device;

[0176] If the light intensity is lower than the preset illuminance threshold, the font size of the English text will be increased.

[0177] If the background noise level is higher than the preset noise threshold, the playback speed and volume of the English text will be increased.

[0178] Synchronize the adjusted font display size and voice playback parameters to all display nodes in the target English display path;

[0179] Output the adjusted English presentation text to the presentation device.

[0180] It should be noted that the current light intensity data and background noise decibel value of the display device are obtained; this environmental data is dynamic and can reflect the actual conditions of the user, thus providing a basis for subsequent adjustments.

[0181] If the detected light intensity is lower than the preset illuminance threshold, the system will automatically increase the font size of the English text; this adjustment is to ensure that the text can still be clearly identified and read even in low-light environments, thus enhancing the user experience.

[0182] If the background noise level exceeds the preset noise threshold, the system will increase the playback speed and volume of the English text; this measure aims to ensure that users can clearly hear the audio content of the displayed text even in noisy environments, thus avoiding information loss.

[0183] Once the font display size and voice playback parameters have been adjusted, these changes will be synchronized to all display nodes in the target English display path; this synchronization ensures that users can always enjoy a consistent display experience regardless of which display node they are in.

[0184] The adjusted English display text is output to the display device to adapt to the new display parameters; this means that users can see or hear the updated content immediately.

[0185] Example 2, please refer to Figure 2 This invention provides a technical solution: a traditional village scene English display system based on image recognition, applicable to the aforementioned traditional village scene English display method based on image recognition, comprising:

[0186] Feature extraction module 1 is used to acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene. The target elements include at least one of traditional buildings, folk items, and natural landscapes.

[0187] Text generation module 2 is used to generate corresponding English display text based on the attribute information of the target element. The attribute information includes the name, historical period, and cultural connotation of the target element.

[0188] The display judgment module 3 is used to determine whether the English display text meets the preset display conditions, which include text accuracy, length range and language difficulty.

[0189] English display module 4 is used to output the English display text to the display device when the English display text meets the preset display conditions;

[0190] The first adjustment module 5 is used to respond to the fact that the English display text does not meet the preset display conditions, and then extract alternative attribute information from the adjacent elements associated with the target element according to the defect type of the English display text, and generate at least one alternative English display text that meets the display conditions, so that the user can select and / or the user can adjust the content of the alternative English display text through voice commands.

[0191] The second adjustment module 6 is used to acquire real-time environmental data of the target element, including light intensity and background noise; adjust the display parameters of the English display text in the target English display path according to the real-time environmental data, including font size and speech rate; and output the adjusted English display text to the display device.

[0192] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited thereto. Various changes can be made within the scope of knowledge possessed by those skilled in the art without departing from the spirit of the present invention.

Claims

1. A method for displaying traditional village scenes in English based on image recognition, characterized in that, include: Acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene. Target elements include at least one of traditional buildings, folk items, and natural landscapes. Based on the attribute information of the target element, generate the corresponding English display text. The attribute information includes the name, historical period, and cultural connotation of the target element. Determine whether the English text being displayed meets the preset display conditions, which include text accuracy, length range, and language difficulty. If the English display text meets the preset display conditions, the English display text will be output to the display device. In response to the English display text not meeting the preset display conditions, alternative attribute information is extracted from the adjacent elements associated with the target element according to the type of defect of the English display text, and at least one alternative English display text that meets the display conditions is generated for the user to select and / or the user to adjust the content of the alternative English display text through voice commands. Acquire real-time environmental data of the target element, including light intensity and background noise; Adjust the display parameters of the English display text in the target English display path based on real-time environmental data. The display parameters include font size and speech rate. Output the adjusted English display text to the display device.

2. The method for displaying traditional village scenes in English based on image recognition according to claim 1, characterized in that, Acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene, including: Obtain the geographical location information of traditional villages, and determine the target village area to be displayed based on the geographical location information; Based on the scope of the target village area, control the image acquisition equipment to move to the preset shooting point in the target village area; Multi-angle image data of the target village area is acquired through image acquisition equipment, and the multi-angle image data is associated with and stored with geographical location information; The feature vectors of image data are obtained, and the feature vectors are processed by a pre-trained object detection model to identify candidate regions in the image. The candidate regions are compared with the traditional village element templates stored in the database to determine the target element category corresponding to the candidate regions. Based on the target element category, extract the contour information and texture features of the target elements in the image.

3. The method for displaying traditional village scenes in English based on image recognition according to claim 2, characterized in that, Based on the attribute information of the target element, generate the corresponding English display text, including: Obtain the historical and cultural attributes of the target element, including the construction date, related folk tales, and traditional crafts. Historical and cultural attributes are converted into structured data, and then the structured data is converted into English display text through a natural language generation model; Perform grammar checks on the English presentation text to ensure it conforms to English expression habits; Extract the architectural style, historical period, and cultural connotation of scene elements in each display area as scene element feature information; Retrieve several candidate text fragments that match the feature information of scene elements from a pre-set English display corpus; Based on the semantic coherence of the candidate corpus fragments and the preset text length limit, the candidate corpus fragments are spliced and grammatically corrected. The spliced and corrected text will be used as the corresponding English display text for the English display unit.

4. The method for displaying traditional village scenes in English based on image recognition according to claim 3, characterized in that, Determine whether the English text being displayed meets the preset display conditions, including: Obtain semantic information from the English display text and compare the semantic information with the attribute information of the target element; If the matching degree between semantic information and attribute information is lower than the preset threshold, the English display text is determined not to meet the accuracy condition. If the matching degree is higher than or equal to the preset threshold, the length of the English display text is further checked to see if it is within the preset character range; If the length of the displayed English text exceeds the preset character range, then the displayed English text is determined not to meet the length range condition. If the language difficulty coefficient of the English display text is higher than the preset level, then the English display text is determined not to meet the language difficulty condition.

5. The method for displaying traditional village scenes in English based on image recognition according to claim 4, characterized in that, Based on the defect type of the English display text, alternative attribute information is extracted from the adjacent elements associated with the target element to generate at least one alternative English display text that meets the display conditions, including: If the defect type of the English text is incomplete content, then obtain the attribute information of the adjacent related elements of the target element. The related elements include the auxiliary components of the same building and the surrounding folk items. Based on the attribute information of the associated elements, fill in the missing content of the English display text and generate alternative English display text; Obtain the number of associated elements for each candidate English display text and evaluate the relevance of the candidate English display text to the target element; Prioritize each candidate English presentation text based on the number and relevance of associated elements; Based on priority, all candidate English presentation texts are sorted and numbered from highest to lowest priority.

6. The method for displaying traditional village scenes in English based on image recognition according to claim 5, characterized in that, The content of the alternative English text to be displayed is available for user selection and / or user-guided adjustment via voice commands, including: Based on the sort number, the user selects one of the alternative English texts as the final text to be displayed by voice command and / or preset gestures. Alternatively, the user can select one of the alternative English texts to be displayed via voice commands and / or preset gestures, and then modify the content of the selected alternative English texts by adjusting the commands, with the modified alternative English texts being used as the final displayed texts; Obtain the latitude and longitude coordinates and category attribute of the target element corresponding to the final displayed text; Using a preset clustering radius and minimum number of elements as conditions, a spatial clustering algorithm is used to perform clustering analysis on the latitude and longitude coordinates of the target elements to obtain several clusters.

7. The method for displaying traditional village scenes in English based on image recognition according to claim 6, characterized in that, The method further includes: Calculate the scene element density and category richness within each cluster; Based on scene element density, category richness, and preset display unit area threshold, adjacent clusters that meet the density requirements are merged into several English display units. Construct a semantic association network among English display units, where nodes represent English display units and edges represent the semantic similarity between English display texts; The English display unit, which is located in the central area and covers the most scene elements, will be used as the starting point for the current display. Based on the spatial location information of several English display units and the semantic relationship of the English display text, a path planning algorithm is used to generate an English display tour path, starting from the display starting point with the largest coverage of the current English display unit.

8. The method for displaying traditional village scenes in English based on image recognition according to claim 7, characterized in that, Based on the spatial location information of several English display units and the semantic relationships of the English display texts, a path planning algorithm is used to generate an English display tour path, starting from the display starting point with the largest coverage area of the current English display unit, including: Using the current English display unit as the current node, query all candidate nodes connected to the current node in the semantic association network; Calculate the ratio of the semantic similarity weight between each candidate node and the current node to the spatial distance; Select the candidate node with the largest ratio as the next English display unit, and add the cumulative tour distance to the distance from the current node to the next node; If the cumulative tour distance does not exceed the preset tour distance constraint, repeat the above steps until all English display units are covered; Connect the center points of several selected English display units in sequence to generate an English display tour path.

9. A method for displaying traditional village scenes in English based on image recognition according to claim 8, characterized in that, Adjust the display parameters of the English text in the target English display path based on real-time environmental data, including: Obtain the current light intensity data and background noise level in decibels of the display device; If the light intensity is lower than the preset illuminance threshold, the font size of the English text will be increased. If the background noise level is higher than the preset noise threshold, the playback speed and volume of the English text will be increased. Synchronize the adjusted font display size and voice playback parameters to all display nodes in the target English display path; Output the adjusted English presentation text to the presentation device.

10. A system for displaying English in a traditional village scene based on image recognition, applicable to the method for displaying English in a traditional village scene based on image recognition as described in any one of claims 1-9, characterized in that, include: The feature extraction module is used to acquire image data of traditional village scenes, extract features from the image data, and identify target elements in the scene. The target elements include at least one of traditional buildings, folk items, and natural landscapes. The text generation module is used to generate corresponding English display text based on the attribute information of the target element. The attribute information includes the name, historical period, and cultural connotation of the target element. The display judgment module is used to determine whether the English display text meets the preset display conditions, which include text accuracy, length range, and language difficulty. The English display module is used to output the English display text to the display device when the English display text meets the preset display conditions. The first adjustment module is used to respond to the fact that the English display text does not meet the preset display conditions. Based on the type of defect of the English display text, it extracts alternative attribute information from the adjacent elements associated with the target element and generates at least one alternative English display text that meets the display conditions, so that the user can select and / or the user can adjust the content of the alternative English display text through voice commands. The second adjustment module is used to acquire real-time environmental data of the target element, including light intensity and background noise. The display parameters of the English text in the target English display path are adjusted based on real-time environmental data. The display parameters include font size and speech rate. Output the adjusted English presentation text to the presentation device.