Point of interest text recognition method and device based on neural network model

By stitching together tiles and using a neural network model to identify polygons and text information at points of interest, the problem of accuracy and completeness in identifying text at points of interest in tile maps was solved, achieving efficient and accurate extraction of text at points of interest.

CN116206322BActive Publication Date: 2026-06-23GUANGZHOU LIZHI NETWORK TECH CO LTD (GUANGDONG)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU LIZHI NETWORK TECH CO LTD (GUANGDONG)
Filing Date
2021-11-29
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing technologies, tile-based map services suffer from noise interference and missing POI data due to hierarchical display when identifying Point of Interest (POI) text, resulting in inaccurate and incomplete text information.

Method used

By using a neural network model to stitch together tiles and adjacent tiles, a convolutional neural network is used to identify polygons of interest points, and a recurrent neural network is used to extract text information. The interest point coordinates are then combined for precise positioning and merging.

Benefits of technology

It achieves accurate and comprehensive recognition of text information of points of interest, reduces noise interference, avoids data loss due to tile classification, and improves the accuracy and completeness of recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116206322B_ABST
    Figure CN116206322B_ABST
Patent Text Reader

Abstract

The application relates to a point of interest text recognition method and device based on a neural network model. The method comprises the following steps: obtaining a tile corresponding to the coordinates of a clicked point of interest and at least one adjacent tile adjacent to the tile at a current level of a map according to the coordinates of the clicked point of interest; splicing the tile and the at least one adjacent tile to obtain a spliced tile; identifying a point of interest of the spliced tile through a convolutional neural network model to obtain a plurality of point of interest polygons of the spliced tile; determining at least one point of interest polygon in the plurality of point of interest polygons as a polygon of the clicked point of interest according to the coordinates of the clicked point of interest; and identifying the spliced tile through a recurrent neural network model to obtain text of the polygon of the clicked point of interest, wherein the text is the text information of the clicked point of interest. The scheme provided by the application can accurately and comprehensively obtain the text information of the point of interest.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of map recognition technology, and in particular to a method and apparatus for recognizing points of interest based on a neural network model. Background Technology

[0002] More and more map services are using tile technology. Most maps provide tiles for users to search. Since a tile is a kind of image, when users view it, they can click on the POI (Point of Interest) on the tile. The engine will then pop up relevant information about the POI and the route (e.g., walking route) from the user's current location to the POI.

[0003] Related technologies use traditional OCR (Optical Character Recognition) methods to obtain data of clicked POIs in tiles. Since POI data on tiles is usually distributed on various complex road networks, using ordinary image binarization processing will result in a lot of interference noise, leading to inaccurate text information. In addition, since the tiles displayed in map search are usually displayed in a hierarchical manner, with different tile sizes corresponding to different levels, this is to quickly respond to user requests and save network traffic as much as possible. However, the hierarchical tiles will cause POIs to be segmented, resulting in missing POI data and incomplete text information.

[0004] In summary, the POI data recognition technology cannot accurately and comprehensively obtain the textual information of the POI. Summary of the Invention

[0005] To address or partially address the problems existing in related technologies, this application provides a method and apparatus for interest point text recognition based on a neural network model, which can accurately and comprehensively obtain text information of interest points.

[0006] The first aspect of this application provides a method for interest-point text recognition based on a neural network model, the method comprising:

[0007] Based on the coordinates of the clicked point of interest, obtain the tile corresponding to the coordinates at the current level of the map, as well as at least one adjacent tile adjacent to the clicked tile;

[0008] The tile is spliced ​​together with at least one adjacent tile to obtain a spliced ​​tile;

[0009] The interest points of the spliced ​​tiles are identified by a convolutional neural network model to obtain multiple interest point polygons of the spliced ​​tiles.

[0010] Based on the coordinates of the clicked point of interest, at least one of the multiple point of interest polygons is determined as the polygon of the clicked point of interest.

[0011] The spliced ​​tiles are identified using a recurrent neural network model to obtain the text of the polygon of the clicked point of interest, which is the text information of the clicked point of interest.

[0012] Preferably, obtaining the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, and at least one adjacent tile adjacent to the clicked tile, includes: obtaining the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, and multiple adjacent tiles adjacent to the clicked tile, based on the coordinates of the clicked point of interest.

[0013] The step of splicing the tile with the at least one adjacent tile to obtain a spliced ​​tile includes: splicing the tile with the plurality of adjacent tiles to obtain the spliced ​​tile.

[0014] Preferably, the step of identifying the interest points of the tiled surface using a convolutional neural network model to obtain multiple interest point polygons of the tiled surface includes:

[0015] The spliced ​​tiles are pre-processed to obtain tiles to be identified with a set size;

[0016] The interest points of the tile to be identified are identified by a convolutional neural network model, and multiple interest point polygons of the tile to be identified are obtained.

[0017] Preferably, determining at least one polygon of the plurality of polygons of interest points as the polygon of the clicked interest point based on the coordinates of the clicked interest point includes:

[0018] Based on the coordinates of the clicked point of interest, the distance between each of the multiple point of interest polygons and the coordinates is obtained;

[0019] The polygons of interest points whose distance from the coordinates is less than a first set distance threshold are identified as candidate polygons of the clicked interest point;

[0020] The candidate polygon with the smallest distance from the coordinates is determined as the predetermined polygon of the clicked point of interest;

[0021] Calculate the distance between each of the other candidate interest point polygons and the predetermined polygon;

[0022] The candidate interest point polygons whose distance from the predetermined polygon is less than a second set distance threshold are merged with the predetermined polygon, and the merged polygon is determined as the polygon of the clicked interest point.

[0023] Preferably, the step of identifying the tiled tiles using a recurrent neural network model to obtain the text of the polygon of the clicked point of interest, wherein the text is the textual information of the clicked point of interest, including...

[0024] Based on the polygon of the clicked point of interest, a region image is obtained from the tiled tiles;

[0025] The region image is identified using a recurrent neural network model to obtain the text of the region image, which is the text information of the clicked point of interest.

[0026] A second aspect of this application provides an interest point text recognition device based on a neural network model, the device comprising:

[0027] The tile acquisition module is used to obtain the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, as well as at least one adjacent tile adjacent to the clicked tile, based on the coordinates of the clicked point of interest.

[0028] A tile splicing module is used to splice the tile obtained by the tile acquisition module with the at least one adjacent tile to obtain a spliced ​​tile;

[0029] The polygon acquisition module is used to identify the interest points of the tile splicing module obtained by the tile splicing module through a convolutional neural network model, and obtain multiple interest point polygons of the splicing tile.

[0030] The polygon determination module is used to determine at least one of the multiple interest point polygons obtained by the polygon acquisition module as the polygon of the clicked interest point based on the coordinates of the clicked interest point.

[0031] The text acquisition module is used to identify the spliced ​​tiles obtained by the tile splicing module through a recurrent neural network model, and obtain the text of the polygon of the clicked point of interest determined by the polygon determination module. The text is the text information of the clicked point of interest.

[0032] Preferably, the device further includes:

[0033] The tile preprocessing module is used to preprocess the spliced ​​tiles obtained by the tile splicing module to obtain tiles to be identified with a set size;

[0034] The polygon acquisition module is further configured to identify the interest points of the tile to be identified obtained by the tile preprocessing module through a convolutional neural network model, thereby obtaining multiple interest point polygons of the tile to be identified.

[0035] Preferably, the device further includes:

[0036] The first calculation module is used to obtain the distance between each of the multiple interest point polygons obtained by the polygon acquisition module and the coordinates of the clicked interest point.

[0037] The polygon determination module is further configured to determine the polygon of the point of interest whose distance from the coordinates is less than a first set distance threshold as the candidate polygon of the clicked point of interest, and to determine the candidate polygon with the smallest distance from the coordinates as the predetermined polygon of the clicked point of interest.

[0038] The second calculation module is used to calculate the distance between each of the other candidate interest point polygons determined by the polygon determination module and the predetermined polygon.

[0039] The polygon determination module is further configured to merge candidate interest point polygons whose distance from the predetermined polygon is less than a second set distance threshold with the predetermined polygon, and determine the merged polygon as the polygon of the clicked interest point.

[0040] Preferably, the device further includes:

[0041] The image acquisition module obtains a region image from the tile splicing module based on the polygon of the clicked point of interest determined by the polygon determination module;

[0042] The text acquisition module is further configured to identify the region image obtained by the image acquisition module through a recurrent neural network model, and obtain the text of the region image, wherein the text is the text information of the clicked point of interest.

[0043] A third aspect of this application provides an electronic device, comprising:

[0044] Processor; and

[0045] A memory that stores executable code, which, when executed by the processor, causes the processor to perform the method described above.

[0046] A fourth aspect of this application provides a computer-readable storage medium having executable code stored thereon, which, when executed by a processor of an electronic device, causes the processor to perform the method described above.

[0047] The technical solution provided in this application may include the following beneficial effects:

[0048] The technical solution of this application involves splicing the tile containing the clicked point of interest with at least one adjacent tile to obtain a spliced ​​tile. The text information of the clicked point of interest is then obtained on the spliced ​​tile, enabling accurate and comprehensive acquisition of the text information of the point of interest.

[0049] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description

[0050] The above and other objects, features and advantages of this application will become more apparent from the more detailed description of exemplary embodiments thereof in conjunction with the accompanying drawings, wherein the same reference numerals generally represent the same components in the exemplary embodiments thereof.

[0051] Figure 1 This is a schematic flowchart illustrating the interest point text recognition method based on a neural network model, as shown in an embodiment of this application.

[0052] Figure 2 This is another schematic diagram of the interest point text recognition method based on a neural network model shown in the embodiments of this application;

[0053] Figure 3 This is a schematic diagram of a tile and adjacent tiles in the interest point text recognition method based on a neural network model, as shown in an embodiment of this application.

[0054] Figure 4 This is a schematic diagram of the structure of an interest point text recognition device based on a neural network model, as shown in an embodiment of this application.

[0055] Figure 5 This is another schematic diagram of the interest point text recognition device based on a neural network model shown in the embodiments of this application;

[0056] Figure 6 This is a schematic diagram of the structure of an electronic device shown in an embodiment of this application. Detailed Implementation

[0057] Embodiments of this application will now be described in more detail with reference to the accompanying drawings. While embodiments of this application are shown in the drawings, it should be understood that this application may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to make this application more thorough and complete, and to fully convey the scope of this application to those skilled in the art.

[0058] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

[0059] It should be understood that although the terms "first," "second," "third," etc., may be used in this application to describe various information, this information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0060] This application provides a method for recognizing text based on points of interest using a neural network model, which can accurately and comprehensively obtain text information about points of interest.

[0061] The technical solutions of the embodiments of this application are described in detail below with reference to the accompanying drawings.

[0062] Figure 1 This is a flowchart illustrating the point-of-interest text recognition method based on a neural network model, as shown in an embodiment of this application.

[0063] See Figure 1 An interest point text recognition method based on a neural network model includes:

[0064] In step S101, based on the coordinates of the clicked point of interest, the tile corresponding to the coordinates at the current level of the map, and at least one adjacent tile is obtained.

[0065] In one embodiment, based on the coordinates of the point of interest clicked by the user, the tile corresponding to the coordinates of the clicked point of interest in the current layer of the tile map that the user is browsing, as well as at least one adjacent tile adjacent to that tile, can be obtained. For example, based on the distance between tiles in the current layer of the tile map, the nearest adjacent tile to the tile containing the clicked point of interest can be obtained.

[0066] In step S102, the tile is spliced ​​with at least one adjacent tile to obtain a spliced ​​tile.

[0067] In one embodiment, the tile containing the clicked point of interest can be spliced ​​with at least one adjacent tile to obtain a spliced ​​tile composed of multiple tiles.

[0068] In step S103, the interest points of the tile are identified by a convolutional neural network model to obtain multiple interest point polygons of the tile.

[0069] In one embodiment, by identifying the points of interest (POIs) of the tiled surface using a trained convolutional neural network model, the position and shape of the text boxes containing all POIs of the tiled surface can be obtained. The position and shape of each text box in all text boxes are then converted into POI polygons, thus obtaining all POI polygons of the tiled surface. The POI polygons include both position and shape.

[0070] In step S104, based on the coordinates of the clicked point of interest, at least one of the multiple point of interest polygons is determined as the polygon of the clicked point of interest.

[0071] In one embodiment, at least one of the multiple interest point polygons can be determined as the polygon of the clicked interest point based on the coordinates of the clicked interest point. The polygon of the clicked interest point can be the interest point polygon with the smallest coordinate distance to the clicked interest point.

[0072] In step S105, the spliced ​​tiles are identified by a recurrent neural network model to obtain the text of the polygon of the clicked point of interest. The text is the text information of the clicked point of interest.

[0073] In one embodiment, a recurrent neural network model can be used to identify the tiled tiles, and the text of the polygon of the clicked point of interest can be obtained based on the position and shape of the polygon. This text is the text information of the clicked point of interest.

[0074] The interest point text recognition method based on a neural network model shown in this application splices the tile containing the clicked interest point with at least one adjacent tile to obtain a spliced ​​tile. The text information of the clicked interest point is obtained on the spliced ​​tile, which can accurately and comprehensively obtain the text information of the interest point.

[0075] Figure 2 This is another flowchart illustrating the point-of-interest text recognition method based on a neural network model, as shown in an embodiment of this application. Figure 2 Compared to Figure 1 The scheme of this application is described in more detail.

[0076] See Figure 2 An interest point text recognition method based on a neural network model includes:

[0077] In step S201, based on the coordinates of the clicked point of interest, the tile corresponding to the coordinates at the current level of the map, as well as multiple adjacent tiles adjacent to the clicked tile, are obtained.

[0078] In one embodiment, when a user browses a tile map, they can click on a POI (Point of Interest). Based on the clicked POI, the latitude and longitude (lng, lat) corresponding to the clicked POI can be obtained; based on the latitude and longitude corresponding to the POI, the tiles corresponding to the current layer of the tile map and the latitude and longitude of the clicked POI can be obtained, and multiple adjacent tiles adjacent to that tile can be obtained.

[0079] In one specific embodiment, such as Figure 3 As shown, based on the coordinates (x, y) of tile 300 corresponding to the latitude and longitude of the clicked POI, eight adjacent tiles are obtained. These eight adjacent tiles include the first tile 301 with coordinates (x-1, y+1), the second tile 302 with coordinates (x, y+1), the third tile 303 with coordinates (x+1, y+1), the fourth tile 304 with coordinates (x-1, y), the fifth tile 305 with coordinates (x+1, y), the sixth tile 306 with coordinates (x-1, y-1), the seventh tile 307 with coordinates (x, y-1), and the eighth tile 308 with coordinates (x+1, y-1).

[0080] Understandably, the coordinates of tile 300 and the 8 adjacent tiles are an illustrative description.

[0081] In step S202, the tile is spliced ​​with multiple adjacent tiles to obtain spliced ​​tiles.

[0082] In one specific embodiment, the tile corresponding to the latitude and longitude of the clicked POI can be spliced ​​together with 8 adjacent tiles to obtain a spliced ​​tile with 9 tiles.

[0083] In step S203, the spliced ​​tiles are pre-processed to obtain tiles of a set size to be identified.

[0084] In one embodiment, the tiled tiles can be subjected to preset processing such as scaling and zero padding to obtain a tile to be identified with a size of 512×512 pixels.

[0085] Assuming the height of the tile is h and the width is w, and the tile size after scaling proportionally by height and width is 512×512 pixels, then the scaling ratio of the tile is the greater of the two ratios: h_ratio (512 to the height h) and w_ratio (512 to the width w).

[0086] h_ratio=512 / h, w_ratio=512 / w;

[0087] ratio=max(h_ratio,w_ratio).

[0088] The height and width of the tiled panels after scaling by the ratio are as follows:

[0089] The scaled height New_h = h * ratio, and the scaled width New_w = w * ratio.

[0090] After scaling the tile pieces proportionally to their height and width, one side meets the requirement of 512 pixels, while the other side is less than 512 pixels. The side that is less than 512 pixels can be padded by filling the part with 0, and finally a tile to be identified with a size of 512×512 pixels is obtained.

[0091] Understandably, in order to reduce or even avoid the recognition error of the convolutional neural network model, the size of the tile to be recognized is consistent with the size of the training sample image.

[0092] In step S204, the interest points of the tile to be identified are identified using a convolutional neural network model to obtain multiple interest point polygons of the tile to be identified.

[0093] In one embodiment, a tile to be identified with a set size is input into a convolutional neural network model. The convolutional neural network model detects the position and shape of the text box of the text information of each POI in the tile to be identified, and obtains the position and shape of the text box of the text information of each POI output by the convolutional neural network model. The position and shape of the text box of the text information of each POI are transformed to obtain multiple point of interest polygons in the tile to be identified. The point of interest polygons can be represented by multiple coordinates to represent their size, shape and position.

[0094] In one specific embodiment, a trained Inception neural network model (a convolutional neural network model) can be used to identify the tiles to be recognized, obtaining the text box positions and shapes of the text information of each POI output by the Inception neural network model. At least 2000 map tiles' POIs, POI text boxes, and POI text information can be correctly labeled and masked to obtain training samples for the Inception neural network model, which can then be trained to obtain a trained Inception neural network model.

[0095] In step S205, the distance between each of the multiple interest point polygons and the coordinates of the clicked interest point is obtained based on the coordinates of the clicked interest point.

[0096] In one embodiment, the distance between each of the multiple interest point polygons and the coordinates of the user-clicked interest point can be calculated based on the coordinates of the point of interest clicked by the user and the coordinates of each interest point polygon in the multiple interest point polygons.

[0097] In step S206, the polygons of interest points whose distance from the coordinates of the clicked interest point is less than a first set distance threshold are determined as candidate polygons of the clicked interest point.

[0098] In one embodiment, the distance between each point of interest polygon and the coordinates of the clicked point of interest can be compared with a first set distance threshold, and the point of interest polygon whose distance from the coordinates of the clicked point of interest is less than the first set distance threshold can be determined as the candidate polygon of the clicked point of interest.

[0099] In one specific embodiment, a polygon of interest that is less than 5 pixels away from the coordinates of the clicked point of interest can be identified as a candidate polygon of the clicked point of interest.

[0100] In step S207, the candidate polygon that has the smallest distance from the coordinates of the clicked point of interest is determined as the predetermined polygon of the clicked point of interest.

[0101] In one embodiment, when there are multiple candidate polygons, the distances between the multiple candidate polygons and the coordinates of the clicked point of interest can be compared, and the candidate polygon with the smallest distance to the coordinates of the clicked point of interest can be determined as the predetermined polygon of the clicked point of interest.

[0102] In step S208, the distance between each candidate interest point polygon and the predetermined polygon is calculated.

[0103] In one embodiment, the coordinates of a predetermined polygon and the coordinates of each of the other candidate point of interest polygons can be used to calculate the distance between the predetermined polygon and each of the other candidate point of interest polygons.

[0104] In step S209, the candidate interest point polygons whose distance from the predetermined polygon is less than the second set distance threshold are merged with the predetermined polygon, and the merged polygon is determined as the polygon of the clicked interest point.

[0105] In one embodiment, the distance between a predetermined polygon and each of the other candidate point of interest polygons can be compared, and candidate point of interest polygons whose distance from the predetermined polygon is less than a second set distance threshold can be selected. The selected candidate point of interest polygons are then merged with the predetermined polygon, and the merged polygon is determined as the polygon of the clicked point of interest.

[0106] In step S210, a region image is obtained by stitching tiles based on the polygon of the clicked point of interest.

[0107] In one embodiment, the corresponding region image is cropped from the tiled tiles according to the coordinates (size, shape, position) of the polygon of the clicked point of interest, and the region image is preprocessed such as scaling to obtain a region image of a predetermined size.

[0108] In one specific embodiment, the region image can be preprocessed, such as by scaling, to obtain a region image with a size of 299×299 pixels.

[0109] It is understandable that, in order to reduce or even avoid the recognition error of the recurrent neural network model, the size of the region image obtained after preprocessing such as scaling is consistent with the size of the training sample image.

[0110] In step S211, the region image is identified by a recurrent neural network model to obtain the text of the region image, which is the text information of the clicked point of interest.

[0111] In one embodiment, a trained recurrent neural network model can be used to identify the region image and obtain the text of the polygon of interest point in the region image output by the recurrent neural network model. This text is the text information of the clicked interest point.

[0112] In one specific embodiment, a trained LSTM (Long Short-Term Memory) neural network model based on the attention mechanism can be used to identify region images and obtain the text of interest point polygons in the region image output by the LSTM neural network model based on the attention mechanism. At least 2000 map tiles, their POIs, text boxes, and text information can be correctly labeled and masked to obtain training samples for the LSTM neural network model based on the attention mechanism. The neural network model can then be trained to obtain a trained LSTM neural network model based on the attention mechanism.

[0113] In one embodiment, when there is a candidate polygon, a region image can be obtained by splicing tiles based on the candidate polygon. The region image is then identified by a recurrent neural network model to obtain the text of the region image, which is the text information of the clicked point of interest.

[0114] The interest point text recognition method based on a neural network model shown in this application splices the tile containing the clicked interest point with at least one adjacent tile to obtain a spliced ​​tile. The text information of the clicked interest point is obtained on the spliced ​​tile, which can accurately and comprehensively obtain the text information of the interest point.

[0115] Furthermore, the point-of-interest (POI) text recognition method based on a neural network model shown in this application embodiment stitches together the tile containing the clicked POI and multiple adjacent tiles to obtain a stitched tile. A convolutional neural network model is used to obtain POI polygons on the stitched tile. Multiple POI polygons whose distance from the clicked POI polygon is less than a set distance threshold are merged with the clicked POI polygon. A region image is obtained from the stitched tile using the merged polygon. A recurrent neural network model is used to identify the text in the region image, and this text is used as the text information of the clicked POI. This method can reduce noise interference in the text recognition process, avoid the problem of POI text information being segmented due to tile grading, and accurately and comprehensively obtain the text information of the POI.

[0116] Corresponding to the aforementioned application function implementation method embodiments, this application also provides an interest point text recognition device, electronic device, and corresponding embodiments based on a neural network model.

[0117] Figure 4 This is a schematic diagram of the structure of an interest point text recognition device based on a neural network model, as shown in an embodiment of this application.

[0118] See Figure 4 A point-of-interest text recognition device based on a neural network model includes a tile acquisition module 401, a tile splicing module 402, a polygon acquisition module 403, a polygon determination module 404, and a text acquisition module 405.

[0119] The tile acquisition module 401 is used to obtain the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, as well as at least one adjacent tile.

[0120] The tile splicing module 402 is used to splice the tiles obtained by the tile acquisition module 401 with at least one adjacent tile to obtain spliced ​​tiles.

[0121] The polygon acquisition module 403 is used to identify the points of interest of the tile splicing module 402 through a convolutional neural network model, and obtain multiple polygons of points of interest of the splicing tile.

[0122] The polygon determination module 404 is used to determine at least one of the multiple interest point polygons obtained by the polygon acquisition module 403 as the polygon of the clicked interest point based on the coordinates of the clicked interest point.

[0123] The text acquisition module 405 is used to identify the spliced ​​tiles obtained by the tile splicing module 402 through a recurrent neural network model, and obtain the text of the polygon of the clicked point of interest determined by the polygon determination module 404. This text is the text information of the clicked point of interest.

[0124] The technical solution shown in this application embodiment splices the tile containing the clicked point of interest with at least one adjacent tile to obtain a spliced ​​tile. The text information of the clicked point of interest is obtained on the spliced ​​tile, which can accurately and comprehensively obtain the text information of the point of interest.

[0125] Figure 5 This is another schematic diagram of the interest point text recognition device based on a neural network model shown in the embodiments of this application.

[0126] See Figure 5 A point-of-interest text recognition device based on a neural network model includes a tile acquisition module 401, a tile splicing module 402, a polygon acquisition module 403, a polygon determination module 404, a text acquisition module 405, a tile preprocessing module 501, a first calculation module 502, a second calculation module 503, and an image acquisition module 504.

[0127] The tile acquisition module 401 is also used to obtain the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, as well as multiple adjacent tiles adjacent to the clicked tile.

[0128] The tile splicing module 402 is also used to splice the tiles obtained by the tile acquisition module 401 with multiple adjacent tiles to obtain spliced ​​tiles.

[0129] The tile preprocessing module 501 performs pre-processing on the spliced ​​tiles obtained by the tile splicing module 402 to obtain tiles to be identified with a set size.

[0130] The polygon acquisition module 403 is also used to identify the interest points of the tile to be identified obtained by the tile preprocessing module 501 through a convolutional neural network model, and obtain multiple interest point polygons of the tile to be identified.

[0131] The first calculation module 502 is used to obtain the distance between the coordinates of the clicked point of interest and the coordinates of each of the multiple point of interest polygons obtained by the polygon acquisition module 403, based on the coordinates of the clicked point of interest.

[0132] The polygon determination module 404 is further configured to determine the polygon of the point of interest whose distance from the coordinates of the clicked point of interest is less than a first set distance threshold as the candidate polygon of the clicked point of interest, and to determine the candidate polygon whose distance from the coordinates of the clicked point of interest is the smallest as the predetermined polygon of the clicked point of interest.

[0133] The second calculation module 503 is used to calculate the distance between each of the other candidate interest point polygons determined by the polygon determination module 404 and the predetermined polygon.

[0134] The polygon determination module 404 is further configured to merge the candidate interest point polygons that are less than a second set distance threshold from the predetermined polygon with the predetermined polygon, and determine the merged polygon as the polygon of the clicked interest point.

[0135] The image acquisition module 504 obtains the region image from the tile splicing module 402 based on the polygon of the clicked point of interest determined by the polygon determination module 404.

[0136] The text acquisition module 405 is also used to identify the region image obtained by the image acquisition module 504 through a recurrent neural network model, and obtain the text of the region image, which is the text information of the clicked point of interest.

[0137] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated further here.

[0138] Figure 6 This is a schematic diagram of the structure of an electronic device shown in an embodiment of this application.

[0139] See Figure 6 The electronic device 60 includes a memory 601 and a processor 602.

[0140] The processor 602 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.

[0141] Memory 601 may include various types of storage units, such as system memory, read-only memory (ROM), and permanent storage devices. ROM may store static data or instructions required by processor 602 or other modules of the computer. Permanent storage devices may be read-write storage devices. Permanent storage devices may be non-volatile storage devices that retain stored instructions and data even when the computer is powered off. In some embodiments, permanent storage devices use mass storage devices (e.g., magnetic or optical disks, flash memory) as permanent storage devices. In other embodiments, permanent storage devices may be removable storage devices (e.g., floppy disks, optical drives). System memory may be a read-write storage device or a volatile read-write storage device, such as dynamic random access memory. System memory may store some or all of the instructions and data required by the processor during operation. Furthermore, memory 601 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and disks and / or optical disks may also be used. In some embodiments, memory 601 may include a removable storage device that is readable and / or writable, such as a laser disc (CD), a read-only digital versatile optical disc (e.g., DVD-ROM, dual-layer DVD-ROM), a read-only Blu-ray disc, an ultra-high density optical disc, a flash memory card (e.g., SD card, mini SD card, Micro-SD card, etc.), a magnetic floppy disk, etc. Computer-readable storage media do not contain carrier waves or transient electronic signals transmitted wirelessly or via wired connections.

[0142] The memory 601 stores executable code, which, when processed by the processor 602, can cause the processor 602 to execute part or all of the methods described above.

[0143] Furthermore, the method according to this application can also be implemented as a computer program or computer program product, which includes computer program code instructions for performing some or all of the steps in the method described above.

[0144] Alternatively, this application may be implemented as a computer-readable storage medium (or a non-transitory machine-readable storage medium or a machine-readable storage medium) storing executable code (or computer program or computer instruction code) thereon, which, when executed by a processor of an electronic device (or server, etc.), causes the processor to perform part or all of the steps of the methods described above according to this application.

[0145] The various embodiments of this application have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A method for interest-point text recognition based on a neural network model, characterized in that, include: Based on the coordinates of the clicked point of interest, obtain the tile corresponding to the coordinates at the current level of the map, as well as at least one adjacent tile adjacent to the clicked tile; The tile is spliced ​​together with at least one adjacent tile to obtain a spliced ​​tile; The interest points of the spliced ​​tiles are identified by a convolutional neural network model to obtain multiple interest point polygons of the spliced ​​tiles. Based on the coordinates of the clicked point of interest, at least one of the plurality of point of interest polygons is determined as the polygon of the clicked point of interest, including: obtaining the distance between each of the plurality of point of interest polygons and the coordinates based on the coordinates of the clicked point of interest; determining the point of interest polygon whose distance from the coordinates is less than a first set distance threshold as a candidate polygon of the clicked point of interest; determining the candidate polygon with the smallest distance from the coordinates as a predetermined polygon of the clicked point of interest; calculating the distance between each of the other candidate polygons and the predetermined polygon; merging the candidate polygons whose distance from the predetermined polygon is less than a second set distance threshold with the predetermined polygon; and determining the merged polygon as the polygon of the clicked point of interest. The spliced ​​tiles are identified using a recurrent neural network model to obtain the text of the polygon of the clicked point of interest, which is the text information of the clicked point of interest.

2. The method according to claim 1, characterized in that, The step of obtaining the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, and at least one adjacent tile adjacent to the clicked tile, includes: obtaining the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, and multiple adjacent tiles adjacent to the clicked tile, based on the coordinates of the clicked point of interest. The step of splicing the tile with the at least one adjacent tile to obtain a spliced ​​tile includes: splicing the tile with the plurality of adjacent tiles to obtain the spliced ​​tile.

3. The method according to claim 1, characterized in that, The step of identifying interest points in the tiled surface using a convolutional neural network model to obtain multiple interest point polygons of the tiled surface includes: The spliced ​​tiles are pre-processed to obtain tiles to be identified with a set size; The interest points of the tile to be identified are identified by a convolutional neural network model, and multiple interest point polygons of the tile to be identified are obtained.

4. The method according to claim 1, characterized in that, The process involves using a recurrent neural network model to identify the tiled surfaces and obtain the text of the polygon representing the clicked point of interest. This text consists of the textual information of the clicked point of interest, including... Based on the polygon of the clicked point of interest, a region image is obtained from the tiled tiles; The region image is identified using a recurrent neural network model to obtain the text of the region image, which is the text information of the clicked point of interest.

5. An interest-point text recognition device based on a neural network model, characterized in that, include: The tile acquisition module is used to obtain the tile corresponding to the coordinates of the clicked point of interest at the current level of the map, as well as at least one adjacent tile adjacent to the clicked tile, based on the coordinates of the clicked point of interest. A tile splicing module is used to splice the tile obtained by the tile acquisition module with the at least one adjacent tile to obtain a spliced ​​tile; The polygon acquisition module is used to identify the interest points of the tile splicing module obtained by the tile splicing module through a convolutional neural network model, and obtain multiple interest point polygons of the splicing tile. The first calculation module is used to obtain the distance between each of the multiple interest point polygons obtained by the polygon acquisition module and the coordinates of the clicked interest point. The polygon determination module is used to determine the polygon of the point of interest whose distance from the coordinates is less than a first set distance threshold as the candidate polygon of the clicked point of interest, and to determine the candidate polygon with the smallest distance from the coordinates as the predetermined polygon of the clicked point of interest. The second calculation module is used to calculate the distance between each of the other candidate polygons and the predetermined polygon. The polygon determination module is further configured to merge candidate polygons whose distance from the predetermined polygon is less than a second set distance threshold with the predetermined polygon, and determine the merged polygon as the polygon of the clicked point of interest. The text acquisition module is used to identify the spliced ​​tiles obtained by the tile splicing module through a recurrent neural network model, and obtain the text of the polygon of the clicked point of interest determined by the polygon determination module. The text is the text information of the clicked point of interest.

6. The apparatus according to claim 5, characterized in that, The device further includes: The tile preprocessing module is used to preprocess the spliced ​​tiles obtained by the tile splicing module to obtain tiles to be identified with a set size; The polygon acquisition module is further configured to identify the interest points of the tile to be identified obtained by the tile preprocessing module through a convolutional neural network model, thereby obtaining multiple interest point polygons of the tile to be identified.

7. The apparatus according to claim 5, characterized in that, The device further includes: The image acquisition module obtains a region image from the tile splicing module based on the polygon of the clicked point of interest determined by the polygon determination module; The text acquisition module is further configured to identify the region image obtained by the image acquisition module through a recurrent neural network model, and obtain the text of the region image, wherein the text is the text information of the clicked point of interest.

8. A computer-readable storage medium, characterized in that, It stores executable code that, when executed by a processor of an electronic device, causes the processor to perform the method as described in any one of claims 1-4.