An administrative four-level address extraction method, device, equipment and storage medium

By generating administrative level 4 addresses using trie and word segmentation techniques, and extracting target addresses using a filtering algorithm, the problem of inaccurate extraction of administrative level 4 addresses in existing technologies is solved, achieving higher address resolution accuracy and lower operating costs.

CN116484850BActive Publication Date: 2026-06-23SHENZHEN LEAPFROG NEW TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN LEAPFROG NEW TECH CO LTD
Filing Date
2023-03-29
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies cannot accurately extract administrative level four addresses, which increases the difficulty of logistics address resolution, resulting in high operating costs for enterprises and a poor user experience.

Method used

A trie-based method is used to extract the administrative level 4 address from the requested address, segment the text, obtain the location index and text precision type, generate the administrative level 4 address, and extract the target administrative level 4 address through a filtering algorithm.

Benefits of technology

It can accurately extract the administrative level 4 address when the requested address contains abbreviations, typos, or homophones, thereby improving the accuracy of address resolution, reducing operating costs, and enhancing user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116484850B_ABST
    Figure CN116484850B_ABST
Patent Text Reader

Abstract

The application discloses an administrative four-level address extraction method, device and equipment and a storage medium. The method comprises the following steps: based on a dictionary tree, administrative four-level information is obtained by extracting a request address; a word segmentation is performed on the administrative four-level information to obtain a plurality of administrative four-level segmented words and corresponding position indexes, and a text accuracy type and an administrative code corresponding to each administrative four-level segmented word are obtained; a plurality of administrative four-level addresses are generated according to the administrative code and the administrative four-level segmented word; and the administrative four-level address is screened according to the position index and the text accuracy type, and a target administrative four-level address is extracted. According to the method, when the request address has errors, such as level conflicts, contains multiple administrative addresses, or has abbreviations, wrong characters, homophonic characters or homographic characters, the administrative four-level information can be normally analyzed, and the accuracy of extracting the administrative four-level address is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of logistics and transportation technology, specifically to a method, apparatus, equipment, and storage medium for extracting administrative level four addresses. Background Technology

[0002] With the rapid development of the logistics industry, there are higher requirements for the accuracy of logistics delivery. Therefore, the resolution of logistics addresses is particularly important, especially the extraction of administrative level 4 addresses. If the administrative level is not extracted accurately, it will greatly increase the operating costs of enterprises and bring a very poor user experience.

[0003] However, some users enter addresses that are not standardized, such as addresses with conflicting province / city information, addresses containing multiple provinces / cities / districts, abbreviated addresses, misspellings, or homophones. Existing administrative level four address extraction schemes cannot extract correct administrative level four address information, which greatly increases the difficulty of address resolution. Therefore, a new administrative level four address extraction method is urgently needed. Summary of the Invention

[0004] This invention provides a method, apparatus, device, and storage medium for extracting administrative level 4 addresses, in order to solve the problem that existing technologies cannot accurately extract administrative level 4 addresses.

[0005] To address the aforementioned technical problems, in a first aspect, the present invention provides a method for extracting administrative level four addresses, the method comprising:

[0006] Based on the trie, the requested address is extracted using the fourth level of administrative authority to obtain the fourth level of administrative authority information.

[0007] The administrative level 4 information is segmented into words to obtain several administrative level 4 words and corresponding position indices, and the text precision type and administrative code corresponding to each administrative level 4 word are obtained.

[0008] Based on the administrative code and the administrative level four word segmentation, several administrative level four addresses are generated, and the administrative level four addresses include several administrative level four word segments with hierarchical relationships.

[0009] Based on the location index and the text precision type, the administrative level 4 addresses are filtered to extract the target administrative level 4 addresses.

[0010] Optionally, the step of filtering the administrative level 4 addresses based on the location index and the text precision type to extract the target administrative level 4 addresses includes:

[0011] Based on the location index, invalid administrative level 4 words in the administrative level 4 address are identified and deleted to obtain a valid administrative level 4 address. The invalid administrative level 4 words are derived from the address portion of the request address that is not administrative level 4 but includes the name of administrative level 4.

[0012] Based on the text precision type and administrative level corresponding to each administrative level four word segment of the valid administrative level four address, calculate the level score of each administrative level, and sum the level scores to obtain the final score of the valid administrative level four address;

[0013] Based on the final score, the target administrative level four address is extracted.

[0014] Optionally, the step of determining invalid fourth-level administrative word segments in the fourth-level administrative address based on the location index includes:

[0015] For each administrative level four word segment of the administrative level four address, the position index interval between the current administrative level four word segment and the superior administrative level four word segment is obtained according to the position index.

[0016] If the position index interval is greater than a preset value, and there are preset company-related characters in the preset interval after the current administrative level four word segmentation, the current administrative level four word segmentation is determined to be an invalid administrative level four word segmentation.

[0017] And / or, if the characters following the current administrative level 4 segment are preset non-administrative level 4 administrative address names, the current administrative level 4 segment is determined to be an invalid administrative level 4 segment.

[0018] Optionally, the step of calculating the level score for each administrative level based on the text precision type and administrative level corresponding to each administrative level four word segment of the valid administrative level four address includes:

[0019] Based on the text precision type corresponding to each administrative level four word segment of the valid administrative level four address, and the preset basic weights corresponding to the name precision type, administrative level precision type and misspelling type in the text precision type, the basic weights of each administrative level are determined.

[0020] Based on the administrative level to which each administrative level-four word belongs in the effective administrative level-four address and the basic weight, the basic score of each administrative level is determined;

[0021] For the provincial level in the administrative hierarchy, the level score of the provincial level is calculated based on whether the corresponding fourth-level administrative segmentation is at the beginning of the text and the basic score mentioned above.

[0022] For non-provincial levels within the administrative hierarchy, the level score for non-provincial levels is calculated based on the number of administrative level four segment words of the same level and the aforementioned basic score.

[0023] Optionally, the step of calculating the level score for non-provincial levels based on the number of administrative level four segmented words at the same level and the base score includes:

[0024] For each level other than the provincial level, when the number of administrative level 4 word segments in the current level is 1, the level score of the current level is calculated as score = p * q.

[0025] When the number is greater than 1, the level score for the current level is calculated as score = p + (p * ((Num - M) / Num)).

[0026] Where score is the level score of the current level, p is the base score of the current level, q is the preset coefficient of the current level, Num is the preset constant coefficient, and M is the number of administrative level 4 words in the current level.

[0027] Optionally, after the step of obtaining the final score of the valid administrative level four address, the method further includes:

[0028] The final score is negatively weighted when any of the following conditions are met:

[0029] The valid administrative level four address is missing an administrative level;

[0030] The number of non-provincial-level administrative level four words in the valid administrative level four address is greater than a preset threshold;

[0031] The valid administrative level 4 address includes preset easily confused characters.

[0032] Optionally, the step of generating several administrative level four addresses based on the administrative code and the administrative level four word segmentation includes:

[0033] For each administrative level 4 word segment, the administrative level is obtained based on the administrative code, and it is determined whether there is a hierarchical relationship between the current administrative level 4 word segment and the previous administrative level 4 word segment.

[0034] If it exists, based on the administrative level, update the current administrative level 4 word segment to the administrative level 4 address corresponding to the previous administrative level 4 word segment;

[0035] If it does not exist, generate a new administrative level 4 address.

[0036] Secondly, the present invention provides an administrative level four address extraction device, including an extraction module, a word segmentation module, a relationship module and a filtering module;

[0037] The extraction module is used to perform administrative level four extraction on the request address based on the trie to obtain administrative level four information;

[0038] The word segmentation module is used to segment the administrative level 4 information to obtain several administrative level 4 words and corresponding position indices, and to obtain the text precision type and administrative code corresponding to each administrative level 4 word.

[0039] The relationship module is used to generate several administrative level four addresses based on the administrative code and the administrative level four word segmentation. The administrative level four addresses include several administrative level four word segments with hierarchical relationships.

[0040] The filtering module is used to filter the administrative level 4 addresses based on the location index and the text precision type, and extract the target administrative level 4 addresses.

[0041] Thirdly, the present invention provides an administrative level four address extraction device, comprising a memory and a processor, wherein:

[0042] The memory is used to store computer programs;

[0043] The processor is used to read the program in the memory and execute the steps of the administrative level four address extraction method provided in the first aspect above.

[0044] Fourthly, the present invention provides a computer-readable storage medium having a readable computer program stored thereon, which, when executed by a processor, implements the steps of the administrative level four address extraction method provided in the first aspect above.

[0045] Compared with the prior art, the administrative level four address extraction method, apparatus, device, and storage medium provided by the present invention have the following beneficial effects:

[0046] This invention first extracts the administrative level 4 information from the request address, then segments the administrative level 4 information into words, obtaining the administrative level 4 word segments and their corresponding location indices, text precision types, and administrative codes. Next, based on the administrative level 4 word segments and administrative codes, the administrative level 4 address is obtained. Finally, the administrative level 4 address is scored according to the text precision type to extract the target administrative level 4 address. This embodiment can extract accurate administrative level 4 addresses even when the request address contains abbreviations, misspellings, homophones, or homographs. It can also correctly parse administrative level 4 information from request addresses entered incorrectly due to customer error. Furthermore, it can effectively select a more accurate administrative information from multiple inputs, improving the accuracy of request address parsing. Attached Figure Description

[0047] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention, and not all embodiments. For those skilled in the art, other drawings obtained from these drawings without creative effort are all within the scope of protection of this application.

[0048] Figure 1 This is a flowchart of an administrative level four address extraction method provided by an embodiment of the present invention;

[0049] Figure 2 This is a flowchart of an embodiment of the present invention for extracting a target administrative level four address;

[0050] Figure 3 This is a flowchart of calculating level scores provided by an embodiment of the present invention;

[0051] Figure 4 This is a schematic diagram of the structure of an administrative level four address extraction device provided in an embodiment of the present invention;

[0052] Figure 5 This is a schematic diagram of the structure of an administrative level four address extraction device provided in an embodiment of the present invention;

[0053] Figure 6 This is a schematic diagram of the structure of a computer-readable storage medium provided in an embodiment of the present invention. Detailed Implementation

[0054] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0055] To make the description of this disclosure more detailed and complete, illustrative descriptions of embodiments and specific examples of the present invention are provided below; however, these are not the only forms of implementing or utilizing the specific embodiments of the present invention. The embodiments cover features of multiple specific embodiments and the methods, steps, and their order for constructing and operating these specific embodiments. However, other specific embodiments may also be used to achieve the same or equivalent functions and step sequences. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without inventive effort are within the scope of protection of this application.

[0056] In the description of the embodiments of the present invention, unless otherwise stated, " / " means "or". For example, A / B can mean A or B. The word "and / or" in the text is merely a description of the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of this application, "multiple" means two or more. Other quantifiers should be understood similarly. The preferred embodiments described herein are only used to illustrate and explain the present invention and are not intended to limit the present invention. Furthermore, the embodiments and features in the embodiments of this application can be combined with each other without conflict.

[0057] This embodiment provides a method for extracting administrative level four addresses, mainly applied in scenarios requiring the extraction of administrative level four addresses, such as mail delivery. After a user fills in a mail delivery address, the address generally includes five parts: province, city, district or county, town, and detailed address. Its general format is: XX Province XX City XX District (County) XX Town + Detailed Address. For example, No. 12, Group 1, Fenghuangshan Village, Miersi Town, Hong'an County, Huanggang City, Hubei Province. Here, Hubei Province is the provincial address, Huanggang City is the city-level address, Hong'an County is the county-level address, Miersi Town is the town-level address, and No. 12, Group 1, Fenghuangshan Village is the detailed address. By executing this administrative level four address extraction method, the administrative level four address of the mail delivery address can be obtained. The administrative level four address refers to the four levels of province, city, district or county, and town. For example, after administrative level four address extraction, No. 12, Group 1, Fenghuangshan Village, Miersi Town, Hong'an County, Huanggang City, Hubei Province, the resulting administrative level four address is "Miersi Town, Hong'an County, Huanggang City, Hubei Province".

[0058] Example 1

[0059] like Figure 1 The present invention provides a flowchart of an administrative level four address extraction method, which includes the following steps.

[0060] Step S101: Based on the trie, extract the administrative level four information from the requested address to obtain the administrative level four information;

[0061] When it is necessary to extract the administrative level 4 address of the requested address, the requested address is input into the trie to obtain the administrative level 4 address.

[0062] In this embodiment, the trie is generated based on two parts: a standard library and a custom dictionary. The standard library includes all administrative information from the National Bureau of Statistics, which refers to standard place names. The custom dictionary mainly includes misspellings, homophones, abbreviations, and homographs corresponding to standard place names. For example, if the standard place name is Wuhan, its abbreviation can be Jiangcheng; if the standard place name is Guangzhou, its abbreviation can be Yangcheng or Huacheng; if the standard name is Quzhou, its common misspellings are Quzhou or Quzhou City, etc.

[0063] The custom dictionary includes frequently occurring misspellings, homophones, abbreviations, and homographs in real-world applications. Any dictionary that meets the requirements can be used as the aforementioned custom dictionary.

[0064] In this embodiment, the basic structure of each term in the standard library and the custom thesaurus is as follows: Administrative information text content: final score: text accuracy type: administrative code.

[0065] The administrative information text content includes standard place names or common misspellings, abbreviations, homophones, or homographs in standard place names; the final score refers to the score calculated when the administrative information text is matched; the text accuracy type refers to the accuracy of the administrative information text content. For example, if the administrative information text content is a standard place name published by the National Bureau of Statistics, the accuracy of the administrative information text content is the highest; if the administrative information text content is a standard place name containing a misspelling, the accuracy of the administrative information text content is relatively high; if the administrative information text content is an alias, the accuracy of the administrative information text content is relatively low; the administrative code represents the administrative code corresponding to the place name in the administrative information text content, which is defined by the National Bureau of Statistics.

[0066] For example, the term in the thesaurus could be: Yongning Town: 6000: STAN: 350581106000, where Yongning Town is the administrative information text content, 6000 is the final score corresponding to Yongning Town, STAN indicates that the text accuracy type of Yongning Town is a standard name, and 350581106000 is the administrative code corresponding to Yongning Town.

[0067] It should be noted that in practical applications, the basic structure of a term can also include text separators. For example, for the term mentioned above, if separators are included, it can be: Yongning Town: 6000: ns: STAN: 350581106000, where ns represents the separator.

[0068] The above-mentioned standard library and custom dictionary are used to generate a trie. Any method that can generate a trie using the standard library and custom dictionary can be used to implement this solution. This embodiment does not make any specific limitations on this.

[0069] In this embodiment, in order to save computing resources and improve computing speed, when generating a trie using the standard library and the custom thesaurus, the administrative information text content in the standard library and the custom thesaurus is first extracted, and then the extracted administrative information text content is used to generate a trie.

[0070] In practical applications, the trie is loaded into memory when the system starts up, and the administrative information text content, text precision type, and administrative code are loaded into separate sets for calculation in subsequent steps.

[0071] After generating the trie, the request address is input into the trie to obtain the four-level administrative information of the request address. The above-mentioned sender address can be regarded as the request address. Extracting the four-level administrative information of the request address means extracting all place names at the provincial level, municipal level, district (county) level, and town level in the request address. Any method that can extract the four-level administrative information in the request address can be used to implement this solution, and this embodiment does not make specific limitations in this regard.

[0072] For example, for No. 12, Group 1, Phoenix Mountain Village, Miersi Town, Hong'an County, Huanggang City, Hubei Province, the extracted four-level administrative information is Miersi Town, Hong'an County, Huanggang City, Hubei Province; for Shenzhen Good Years Mould Co., Ltd., Yuejiang West Road, Haizhu District, Guangzhou City, Guangdong Province, the extracted four-level administrative information is Haizhu District, Guangzhou City, Guangdong Province, Shenzhen City; for Group 6, Shijiazhuang Village, Jingxiu District, Baoding City, Hebei Province, the extracted four-level administrative information is Shijiazhuang, Jingxiu District, Baoding City, Hebei Province; for Factory Building No. 2, Yulongchang Industrial Park, No. 100, Keyuan West Road, Pingshan, Longgang District, Shenzhen City, Beijing, the extracted four-level administrative information is Longgang District, Shenzhen City, Beijing.

[0073] Continue with the above Figure 1 Regarding the description, in step S102, the four-level administrative information is segmented to obtain several four-level administrative segments and corresponding position indexes, and the text accuracy type and administrative code corresponding to each four-level administrative segment are obtained;

[0074] In this embodiment, the trie is used to segment the four-level administrative information. For example, for Cangzhou City, Hebei Province, first, the first character '河' is intercepted from the four-level administrative information string. It is judged whether there is the character '河' under the root node of the trie. If it exists, the start index is 0 at this time, and the end index is 1; then, the next character '北' of the string is obtained, and the matching continues from the node where '河' is located to the child nodes to judge whether there is the character '北' in the child nodes. If it exists, it is judged whether the word node '河北' is a valid word. When '河北' is a valid word, the four-level administrative segment '河北' is obtained, and the start index of this four-level administrative segment object is 0, and the end index is 2; then, the next character '省' is obtained, and it is judged whether there is the character '省' under the matching child nodes of the '北' node. If it exists, it is judged whether '河北省' can form a valid word. When '河北省' is a valid word, another four-level administrative segment '河北省' is obtained. At this time, the start subscript of the four-level administrative segment '河北省' is 0, and the end subscript is 3.

[0075] Continue to obtain the next character "Cang" in the request address, and determine whether there is a character node "Cang" under the child nodes of the "Province" node. If not, jump to the root node. At this time, the start node of "Cang" is set to 3, and the end node is 4; repeat the above process, match each character in the string, and output all possible generated word objects.

[0076] It should be noted that in this embodiment, the position index includes two parts: the start index and the end index. The start index represents the start position of the administrative four-level word segmentation, and the end index represents the end position of the administrative four-level word segmentation.

[0077] For example, for "Cangzhou City, Hebei Province", its administrative four-level word segmentation and corresponding position index are as follows:

[0078] {0, 2}: Corresponding to the administrative four-level word segmentation "Hebei";

[0079] {0, 3}: Corresponding to the administrative four-level word segmentation "Hebei Province";

[0080] {3, 5}: Corresponding to the administrative four-level word segmentation "Cangzhou";

[0081] {3, 6}: Corresponding to the administrative four-level word segmentation "Cangzhou City".

[0082] It should be noted that {0, 2} is the position index, 0 represents the start index of "Hebei", and 2 represents the end index.

[0083] Optionally, duplicate removal processing can also be performed on the administrative four-level word segmentation. For example, for "Hebei" and "Hebei Province", only "Hebei Province" is retained. For "Cangzhou" and "Cangzhou City", "Cangzhou City" will be retained. Any method that can perform duplicate removal processing can be used to implement this solution, and this embodiment does not make specific limitations on this.

[0084] In this embodiment, according to the obtained administrative four-level word segmentation, the text accuracy type and administrative code of each administrative four-level word segmentation are obtained. Any method that can extract the text accuracy type and administrative code of the administrative four-level word segmentation can be used to implement this solution.

[0085] For example, each administrative four-level word segmentation is respectively matched with the standard library and the custom word library to obtain the matched entries, and the text accuracy type and administrative code in the matched entries are used as the text accuracy type and administrative code of the administrative four-level word segmentation.

[0086] Step S103, generate a number of administrative four-level addresses according to the administrative code and the administrative four-level word segmentation. The administrative four-level addresses include a number of administrative four-level word segmentations with a hierarchical relationship;

[0087] Based on each administrative level 4 word segment and the corresponding administrative code, one or more administrative level 4 addresses are obtained. Any method that can generate administrative level 4 addresses based on administrative codes and administrative level 4 words can be used to implement this scheme. This embodiment does not make specific limitations on this.

[0088] In this embodiment, the administrative level 4 address is composed of administrative level 4 word segments with a hierarchical relationship. For example, Hebei Province and Cangzhou City have a hierarchical relationship, and Cangzhou City in Hebei Province can be regarded as an administrative level 4 address.

[0089] As one implementation method, the step of generating several administrative level four addresses based on the administrative code and the administrative level four word segmentation includes:

[0090] For each administrative level 4 word segment, the administrative level is obtained based on the administrative code, and it is determined whether there is a hierarchical relationship between the current administrative level 4 word segment and the previous administrative level 4 word segment.

[0091] If it exists, based on the administrative level, update the current administrative level 4 word segment to the administrative level 4 address corresponding to the previous administrative level 4 word segment;

[0092] If it does not exist, generate a new administrative level 4 address.

[0093] In this embodiment, for each administrative level 4 word segment, the administrative level of the administrative level 4 word segment is obtained based on its administrative code. In this embodiment, the administrative level refers to the address level, that is, which level of address the administrative level 4 word segment belongs to: provincial address, municipal address, district (county) address, or town address. It is then determined whether there is a hierarchical relationship between the current administrative level 4 word segment and the previous administrative level 4 word segment. If there is a hierarchical relationship between the current administrative level 4 word segment and the previous administrative level 4 word segment, then the current administrative level 4 word segment and the previous administrative level 4 word segment are combined to obtain the administrative level 4 address.

[0094] For example, considering the administrative level four terms "Hebei Province," "Cangzhou City," and "Wuchang District," based on the administrative code corresponding to "Hebei Province," we can know that the administrative level of "Hebei Province" is provincial, "Cangzhou City" is municipal, and "Wuchang District" is district. For "Hebei Province," it is currently the administrative level four term, belonging to the provincial level, and there is no higher-level administrative level four term. Therefore, the administrative address corresponding to "Hebei Province" is directly "Hebei Province." For "Cangzhou City," it is currently the administrative level four term, belonging to the municipal level, and its higher-level administrative level four term is provincial, which is "Hebei Province." "Cangzhou City" does exist under "Hebei Province," and there is indeed a hierarchical relationship between them. Therefore, adding "Cangzhou City" to the administrative level four address corresponding to "Hebei Province" yields "Hebei Province Cangzhou City," which is an administrative level four address.

[0095] For "Wuchang District", it is currently the fourth-level administrative terminology. "Wuchang District" belongs to the district level. The next higher-level administrative terminology of "Wuchang District" is "Cangzhou City". There is no "Wuchang District" under "Cangzhou City". There is no hierarchical relationship between "Wuchang District" and "Cangzhou City". In this case, the address of the superior of Wuchang District is represented by "***", which gives "******Wuchang District". "******Wuchang District" is used as the new fourth-level administrative address.

[0096] For example, the extracted administrative level 4 information for Building 2, Yulongchang Industrial Park, No. 100, Pinghushan Science Park West Road, Longgang District, Shenzhen, Beijing, is "Beijing, Shenzhen, Longgang District". After segmenting the administrative level 4 information, the resulting administrative level 4 words are "Beijing", "Shenzhen", and "Longgang District". The corresponding administrative level 4 addresses are "Beijing" and "Shenzhen, Longgang District".

[0097] In this embodiment, the hierarchical relationship of the administrative level four word segmentation is determined to obtain the administrative level four address. Even if the request address is entered incorrectly due to customer error, the correct administrative level four address can still be parsed. Furthermore, it can effectively select a more accurate administrative information when multiple administrative information is input, which helps to improve the parsing accuracy of the request address in the future.

[0098] Step S104: Based on the location index and the text precision type, filter the administrative level 4 addresses and extract the target administrative level 4 addresses.

[0099] Each administrative level 4 address is scored based on its location index and text precision type. The scores are then used to filter each administrative level 4 address, and the administrative level 4 address with the highest score is selected as the target administrative level 4 address.

[0100] Any method that filters administrative level 4 addresses based on location index and text precision type to obtain the target administrative level 4 address can be used to implement this scheme.

[0101] For example, based on the location index and text precision type, scores can be assigned to "Pinghushanxia Science Park West Road, Longgang District, Beijing" and "Pinghushanxia Science Park West Road, Longgang District, Shenzhen" to obtain the score for each administrative level 4 address. Generally, the administrative level 4 address with the highest score is selected as the target administrative level 4 address.

[0102] As one implementation method, such as Figure 2 As shown, this embodiment of the invention provides a flowchart for extracting a target administrative level four address. In step S104, the step of filtering the administrative level four addresses according to the location index and the text precision type to extract the target administrative level four address includes:

[0103] Step S1041: Based on the location index, determine and delete invalid administrative level 4 words in the administrative level 4 address to obtain a valid administrative level 4 address. The invalid administrative level 4 words are derived from the address portion of the request address that is not administrative level 4 but includes the administrative level 4 name.

[0104] As seen in the above embodiments, the requested address includes two parts: an administrative level 4 address and a detailed address. However, some detailed addresses may include administrative level 4 names. These detailed addresses are the non-administrative level 4 address portions of the requested address, and the administrative level 4 names contained in these detailed addresses are invalid administrative level 4 words. Deleting the invalid administrative level 4 words from the administrative level 4 address yields a valid administrative level 4 address.

[0105] For example, “Guangdong Province, Guangzhou City, Haizhu District, Yuejiang West Road, Shenzhen Haonianhua Mold Co., Ltd.” is a detailed address. However, when extracting the administrative level 4 information in step S101, “Guangdong Province, Guangzhou City, Haizhu District, Shenzhen” will be extracted. After segmenting “Guangdong Province, Guangzhou City, Haizhu District, Shenzhen”, the administrative level 4 segmentation words “Guangdong Province”, “Guangzhou City”, “Haizhu District”, and “Shenzhen” can be obtained. In this administrative level 4 segmentation word, “Shenzhen” is located in the detailed address. Therefore, “Shenzhen” is an invalid administrative level 4 segmentation word and is deleted. The resulting valid administrative level 4 address is “Guangdong Province, Guangzhou City, Haizhu District, Yuejiang West Road”.

[0106] As one implementation method, the step of determining invalid administrative level four word segments in the administrative level four address based on the location index includes:

[0107] For each administrative level four word segment of the administrative level four address, the position index interval between the current administrative level four word segment and the superior administrative level four word segment is obtained according to the position index.

[0108] If the position index interval is greater than a preset value, and there are preset company-related characters in the preset interval after the current administrative level four word segmentation, the current administrative level four word segmentation is determined to be an invalid administrative level four word segmentation.

[0109] And / or, if the characters following the current administrative level 4 segment are preset non-administrative level 4 administrative address names, the current administrative level 4 segment is determined to be an invalid administrative level 4 segment.

[0110] In the specific implementation process, one method to determine invalid administrative level 4 word segments is as follows: In order to find invalid administrative level 4 word segments in the administrative level 4 address, firstly, for each administrative level 4 word segment in the administrative level 4 address, each administrative level 4 word segment is taken as the current administrative level 4 word segment. Based on the position index of the current administrative level 4 word segment, the position index interval between the current administrative level 4 word segment and the superior administrative level 4 word segment is obtained.

[0111] For example, for "Guangdong Province, Guangzhou City, Haizhu District, Yuejiang West Road, Shenzhen Haonianhua Mold Co., Ltd.", the administrative level 4 segment words are "Guangdong Province", "Guangzhou City", "Haizhu District", and "Shenzhen City". Taking these four administrative level 4 segment words as the current administrative level 4 segment word, when "Guangzhou City" is the current administrative level 4 segment word, its superior administrative level 4 segment word is "Guangdong Province". By subtracting the starting index of "Guangdong Province" from the starting index of "Guangzhou City", we can obtain a positional index interval of 3 between "Guangzhou City" and "Guangdong Province"; the positional index interval between "Haizhu District" and "Guangzhou City" is also 3; for "Shenzhen City", when "Shenzhen City" is the current administrative level 4 segment word, its superior administrative level 4 segment word is "Guangdong Province", therefore the positional index interval between "Shenzhen City" and "Guangdong Province" is 13.

[0112] After obtaining the position index interval between each administrative level 4 segment and its superior administrative level 4 segment, each administrative level 4 segment is treated as the current administrative level 4 segment for judgment. For the current administrative level 4 segment, if the position index interval of the current administrative level 4 segment is greater than a preset value, and the preset interval character after the current administrative level 4 segment exists in company-related characters, then the current administrative level 4 segment can be determined as an invalid administrative level 4 segment.

[0113] In this embodiment, the preset values ​​and preset character intervals can be determined according to the actual situation, and this embodiment does not impose specific limitations on them.

[0114] In this embodiment, the preset characters related to the company can be any preset characters containing the word "company", such as "company", "limited company", "responsibility limited company", "technology company", "technology company", etc. The specific characters can be determined according to the actual situation, and this embodiment does not make any specific limitations on them.

[0115] For example, the preset value is 10, and the preset character interval is 10. When the current administrative level 4 segment is "Guangzhou City", the position index interval between it and the superior administrative level 4 segment is 3, which is less than the preset value. Therefore, "Guangzhou City" can be determined not to be an invalid administrative level 4 segment. Similarly, the position index interval between "Haizhu District" and the superior administrative level 4 segment is less than the preset value, so neither of them is an invalid administrative level 4 segment. When the current administrative level 4 segment is "Shenzhen City", the position index interval between it and the superior administrative level 4 segment is 13, which is greater than the preset value. Furthermore, in the request address, the 10-character interval after "Shenzhen City" contains "Haonianhua Mold Co., Ltd.", which contains company-related characters. Therefore, "Shenzhen City" can be determined to be an invalid administrative level 4 segment. Deleting this invalid administrative level 4 segment yields the valid administrative level 4 address "Guangdong Province, Guangzhou City, Haizhu District".

[0116] Another method for determining invalid administrative level 4 word segments is as follows: After obtaining multiple administrative level 4 word segments, each administrative level 4 word segment is used as the current administrative level 4 word segment for judgment. For the current administrative level 4 word segment, it is determined whether the several characters adjacent to the current administrative level 4 word segment are preset non-administrative level 4 administrative address names. In this embodiment, the several adjacent characters can be one or several adjacent characters. The preset non-administrative level 4 administrative address names refer to preset names, such as village, road, street, community and street, etc. If it is a preset non-administrative level 4 administrative address name, then the current administrative level 4 word segment is determined to be an invalid administrative level 4 word segment.

[0117] For example, for the administrative level 4 address "Shijiazhuang Village, Jingxiu District, Baoding City, Hebei Province", its administrative level 4 segmentation words are "Hebei Province", "Baoding City", "Jingxiu District", and "Shijiazhuang". When "Hebei Province" is the current administrative level 4 segmentation word, the first character after it is "Bao", which does not belong to the preset non-administrative level 4 administrative address name. Therefore, "Hebei Province" is not an invalid administrative level 4 segmentation word. Analyzing "Baoding City" and "Jingxiu District" in the same way, it can be determined that "Baoding City" and "Jingxiu District" are not invalid administrative level 4 segmentation words. When "Shijiazhuang" is the current administrative level 4 segmentation word, the first character after it is "Cun", which belongs to the preset non-administrative level 4 administrative address name. Therefore, "Shijiazhuang" is determined to be an invalid administrative level 4 segmentation word. Deleting this invalid administrative level 4 segmentation word yields the valid administrative level 4 address "Jingxiu District, Baoding City, Hebei Province".

[0118] Continue with the above Figure 2 As explained in step S1042, based on the text precision type and administrative level corresponding to each administrative level word segment of the effective administrative level address, the level score of each administrative level is calculated, and the level scores are accumulated to obtain the final score of the effective administrative level address.

[0119] Then, based on the text precision type and administrative level corresponding to each administrative level four word in the valid administrative level four address, which generally includes provincial level administrative level four words, municipal level administrative level four words, district (county) level administrative level four words, and town level administrative level four words, the level score of each administrative level is calculated. Finally, the level scores of each administrative level are summed to obtain the final score of the valid administrative level four address.

[0120] As one implementation method, such as Figure 3 As shown, this embodiment of the invention provides a flowchart for calculating level scores. In step S1042, the step of calculating the level score for each administrative level based on the text precision type and administrative level corresponding to each administrative level four word segment of the valid administrative level four address includes:

[0121] Step S21: Determine the basic weight of each administrative level based on the text precision type corresponding to each administrative level word segment of the valid administrative level address, and the preset basic weights corresponding to the name precision type, administrative level precision type and misspelling type in the text precision type.

[0122] Specifically, the basic weights for each administrative level are obtained based on the text precision type corresponding to each administrative level four word segment in the valid administrative level four address and the basic weights corresponding to each administrative level four word segment.

[0123] It should be noted that the text accuracy type in this embodiment includes three aspects: name accuracy type, administrative level accuracy type, and misspelling type. The degree of text accuracy is determined from these three aspects.

[0124] It is easy to understand that the higher the degree of text precision, the higher the certainty and accuracy of the text, and the greater the corresponding basic weight of the text. Any method for determining the text precision type based on name precision type, administrative level precision type, and misspelling type can be used to implement this solution, and this embodiment does not specifically limit it.

[0125] For example, the name accuracy type includes three types: full name, standard name, and main name. The full name is based on the standard name with reserved space for expansion. The standard name refers to the standard place name published by the National Bureau of Statistics. The main name refers to the primary name. For example, "Wuhan City" is the standard name, and "Wuhan" can be regarded as the main name. The administrative level accuracy type includes two types: province / city and non-province / city. The misspelling type includes four types: whether it is an alias and whether it contains misspellings. According to the above standards, this embodiment divides the text accuracy type into nine types: full name, standard name, standard name containing misspellings, main name and main name containing misspellings, province / city main name and province / city main name containing misspellings, and alias and alias containing misspellings. These are represented by full, stan, stane, main, main, main12, main12, alias, and aliase, respectively. The basic weight corresponding to each type can be determined according to the actual situation.

[0126] For example, the basic weights corresponding to each text type in this embodiment are as follows:

[0127] full: 1, stan: 1, stane: 0.99, main: 0.6, maine: 0.5, main12: 0.9, maine12: 0.8, alias: 0.9, aliase: 0.89.

[0128] A valid administrative level 4 address contains multiple administrative level 4 word segments. The administrative level of these administrative level 4 word segments can be provincial, municipal, district (county) level, and town level addresses, etc. For different levels of administrative level 4 word segments, it is necessary to determine the basic weight of each administrative level based on the text precision type of the administrative level 4 word segment.

[0129] Step S22: Determine the basic score for each administrative level based on the administrative level to which each administrative level word belongs in the effective administrative level four address and the basic weight;

[0130] After obtaining the basic weight of each administrative level 4 word segment in the valid administrative level 4 address, the basic score of each administrative level is determined according to the administrative level of each administrative level 4 address. The calculation formula of the basic score of different administrative levels may be different, and it can be determined according to the actual scenario. This embodiment does not make specific limitations on this.

[0131] For example, for provincial-level administrative level four-level words, the basic score calculation formula is: p = 1000 * q; for municipal-level administrative level four-level words, the basic score calculation formula is: p = 1000 * q; for district-level administrative level four-level words, the basic score calculation formula is: p = 800 * q; for town-level administrative level four-level words, the basic score calculation formula is: p = 500 * q.

[0132] As an optional implementation method, the index interval between each administrative level 4 word in the valid administrative level 4 address and the upper-level administrative level 4 word can be used to determine whether there is a level missing in the valid administrative level 4 address, and the basic score can be optimized according to the level missing situation.

[0133] For example, if the subscript index of a valid administrative level 4 address is 0 or the distance from the parent subscript index is greater than 4, and it is the main name of a district or town, the basic score is directly set to zero.

[0134] For example, for district-level administrative level 4 segmentation, if the district-level administrative level 4 segmentation has no superior but has subordinates, then the optimized base score = base score * 0.5; if the district-level administrative level 4 segmentation has neither superior nor subordinate, the base score is not awarded and it is considered an invalid administrative information.

[0135] As an alternative implementation, the base score can be optimized based on preset easily confused characters contained in the valid administrative level 4 address.

[0136] For example, for a valid administrative level 4 address at the district level, if a certain administrative level 4 word includes any one of the following: urban area, development zone, clearing zone, city, new district, fragrant zone, demonstration zone, road zone, suburbs, and ancient zone, and the administrative level 4 word has subordinates, then the optimized base score = base score * 0.5;

[0137] Step S23: For the provincial level in the administrative hierarchy, calculate the level score of the provincial level based on whether the corresponding fourth-level administrative segmentation is at the beginning of the text and the basic score.

[0138] For non-provincial levels within the administrative hierarchy, the level score for non-provincial levels is calculated based on the number of administrative level four segment words of the same level and the aforementioned basic score.

[0139] In this embodiment, for provincial-level administrative level four-level word segments, the provincial-level level score is calculated based on whether the administrative level four-level word segment begins with a valid administrative level four address, combined with a base score. Any method for calculating the provincial-level level score based on the base score can be used to implement this scheme.

[0140] For example, for a provincial-level administrative level four word segment, if the administrative level four word segment is at the beginning of a valid administrative level four address, the provincial-level score calculation formula is: score=(p*1.2)*1.2.

[0141] As one implementation method, step S23, which involves calculating the level score for non-provincial levels based on the number of administrative level four segmentation words of the same level and the base score, includes:

[0142] For each level other than the provincial level, when the number of administrative level 4 word segments in the current level is 1, the level score of the current level is calculated as score = p * q.

[0143] When the number is greater than 1, the level score for the current level is calculated as score = p + (p * ((Num - M) / Num)).

[0144] Where score is the level score of the current level, p is the base score of the current level, q is the preset coefficient of the current level, Num is the preset constant coefficient, and M is the number of administrative level 4 words in the current level.

[0145] For city-level, district (county)-level, and town-level administrative fourth-level word segmentation, different calculation formulas are used for different situations where one or more cities / districts (counties) / towns appear in the valid administrative fourth-level address.

[0146] If a city / district (county) / town appears in a valid administrative level 4 address, the level score for the current level is calculated using score = p * q; if multiple cities / districts (counties) / towns appear in a valid administrative level 4 address, the level score for the current level is calculated using score = p + (p * ((Num-M) / Num)).

[0147] Where score is the level score of the current level, p is the base score of the current level, q is the preset coefficient of the current level, Num is the preset constant coefficient, and M is the number of administrative level 4 words in the current level.

[0148] Any method that meets the above calculation requirements can be used to implement this solution, and this embodiment does not specifically limit it.

[0149] For example, for a city-level administrative level four word segmentation, if one city appears in the valid administrative level four address, the city-level score is: score = p * 3; if multiple cities appear in the valid administrative level four address, the city-level score is: score = p + (p * ((10 - (C + S)) / 10)), Num = 10, M = C + S. Here, p represents the base score for the city level, 10 represents a preset constant coefficient, C represents the number of cities in the valid administrative address, and S represents the number of districts (counties) in the valid administrative address.

[0150] For administrative level four word segmentation at the district (county) level, when one district (county) appears in the valid administrative level four address, the district (county) level score is: score = p * 2; when multiple districts (counties) appear in the valid administrative level four address, the district (county) level score is: score = p + (p * ((10-S) / 10)), Num = 10, M = S. Here, p represents the basic score for the district (county) level, 10 represents a preset constant coefficient, and S represents the number of districts in the valid administrative address.

[0151] For the fourth-level administrative segmentation of towns, if one town appears in the valid fourth-level administrative address, the town-level score is: score = p * 3; if multiple towns appear in the valid fourth-level administrative address, the town-level score is: score = p + (p * ((10-T) / 10)), Num = 10, M = T. Here, p represents the base score for the town level, 10 represents a preset constant coefficient, and T represents the number of towns in the valid administrative address.

[0152] The final score for a valid administrative level four address is obtained by adding the scores for the provincial, municipal, district, and town levels.

[0153] Optionally, in the above calculation process, when the provincial-level administrative fourth-level words or the municipal-level administrative fourth-level words in the valid administrative fourth-level address are municipalities directly under the central government, the level score of that level is directly set to 800.

[0154] To prevent customers from municipalities from only filling in the city level and not the district level, the scores for municipalities at the provincial and municipal levels are lower than those for ordinary administrative information.

[0155] Continue with the above Figure 2 As explained in step S1043, the target administrative level four address is extracted based on the final score.

[0156] Finally, based on the final score, the target administrative level 4 address is extracted from all valid administrative level 4 addresses. Generally speaking, the valid administrative level 4 address with the highest final score is taken as the target administrative level 4 address.

[0157] Optionally, after the step of obtaining the final score of the valid administrative level four address, the method further includes:

[0158] The final score is negatively weighted when any of the following conditions are met:

[0159] The valid administrative level four address is missing an administrative level;

[0160] The number of non-provincial-level administrative level four words in the valid administrative level four address is greater than a preset threshold;

[0161] The valid administrative level 4 address includes preset easily confused characters.

[0162] After calculating the final score for each valid administrative level 4 address in the above steps, the final score can be optimized by applying a negative weighting.

[0163] Specifically, the final score can be optimized if the number of non-provincial level administrative level four words in the valid administrative level four address is greater than a preset threshold.

[0164] For example, when a valid administrative level 4 address contains multiple lower-level valid administrative level 4 words at the provincial level, and the starting position of the index of the lower-level valid administrative level 4 words is greater than 5:

[0165] When there are 10 or more subordinates: the optimized final score = final score * 0.5;

[0166] When there are 5 to 10 subordinates: Optimized final score = final score * 0.7;

[0167] When there are 2 to 5 lower levels: the optimized final score = final score * 0.8.

[0168] Specifically, the final score can be optimized based on the fact that the valid administrative level 4 address has missing administrative level information.

[0169] For example, if the index interval between a certain administrative level 4 word in a valid administrative level 4 address and its superior administrative level 4 word is greater than 10, then the optimized final score is directly set to 0; if the index interval between a certain administrative level 4 word and its superior administrative level 4 word is less than 10 and greater than 0, the optimized final score = final score - (index interval * 10); if the index interval is equal to 0, the optimized final score = final score * 1.2.

[0170] For example, when the valid administrative level 4 address starts with "district" or "town" and lacks a suffix (town or street): the optimized base score = base score * 0.5; when the administrative level 4 word segmentation of the district starts from the first character, the optimized base score = base score * 1.2.

[0171] Specifically, the final score can be optimized based on preset easily confused characters contained in a valid administrative level 4 address.

[0172] For example, if an economic development zone appears in the valid level 4 administrative address, the optimized final score is equal to the final score * 0.6; if any of the following appears in the valid level 4 administrative address: development zone, economic zone, industrial park, office, region, or committee, the optimized final score is equal to the final score * 0.5; if a street appears in the valid level 4 administrative address, the optimized final score is equal to the final score * 0.8.

[0173] The above process can extract the valid target administrative level four from the requested address.

[0174] For example, the requested address is as follows: Building 2, Yulongchang Industrial Park, No. 100, Pinghushan Science Park West Road, Longgang District, Shenzhen, Beijing. Through the above calculation process: the final score for Beijing is 1290, and the final score for Shenzhen is 2160. After the above calculation process, it can be effectively determined that the valid administrative level 4 entered by the user is: Longgang District, Shenzhen, Guangdong Province.

[0175] This embodiment provides a method for extracting administrative level 4 addresses. First, it extracts administrative level 4 information from the request address. Then, it segments the administrative level 4 information into words, obtaining the administrative level 4 word segments, corresponding location indices, text precision types, and administrative codes. Next, based on the administrative level 4 word segments and administrative codes, it obtains the administrative level 4 address. Finally, it scores the administrative level 4 address based on the text precision type to extract the target administrative level 4 address. This embodiment can extract accurate administrative level 4 addresses even when the request address contains abbreviations, misspellings, homophones, or homographs. It can also correctly parse administrative level 4 information from request addresses entered incorrectly due to customer error. Furthermore, it can effectively select a more accurate administrative information from multiple inputs, ultimately improving the accuracy of request address parsing.

[0176] Example 2

[0177] Based on the above-mentioned administrative level four address extraction method, such as Figure 4 As shown in the figure, this embodiment of the invention provides a structural schematic diagram of an administrative level four address extraction device, which includes an extraction module 410, a word segmentation module 420, a relationship module 430, and a filtering module 440;

[0178] The extraction module 410 is used to perform administrative level four extraction on the request address based on the trie to obtain administrative level four information;

[0179] The word segmentation module 420 is used to segment the administrative level 4 information into words, obtain several administrative level 4 words and corresponding position indices, and obtain the text precision type and administrative code corresponding to each administrative level 4 word.

[0180] The relationship module 430 is used to generate several administrative level four addresses based on the administrative code and the administrative level four word segmentation, wherein the administrative level four addresses include several administrative level four word segments with hierarchical relationships.

[0181] The filtering module 440 is used to filter the administrative level 4 address according to the location index and the text precision type, and extract the target administrative level 4 address.

[0182] For other details regarding the implementation of the above-mentioned technical solution by each module in the above-mentioned administrative level four address extraction device, please refer to the description in the administrative level four address extraction method provided in the above-mentioned invention embodiments, which will not be repeated here.

[0183] Based on the above-mentioned administrative level four address extraction method, such as Figure 5 As shown in the diagram, this embodiment of the invention also provides a structural schematic of an administrative level four address extraction device. This identification device includes a processor 501 and a memory 502 coupled to the processor 501. The memory 502 stores a computer program, which, when executed by the processor 501, causes the processor 501 to perform the steps of an administrative level four address extraction method as described in the above embodiment.

[0184] For other details regarding the implementation of the above-mentioned technical solution by the processor 501 in the above-mentioned administrative level four address extraction device, please refer to the description in the administrative level four address extraction method provided in the above-mentioned invention embodiments, which will not be repeated here.

[0185] The processor 501 can also be called a CPU (Central Processing Unit). The processor 501 may be an integrated circuit chip with signal processing capabilities. The processor 501 may also be a general-purpose processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The general-purpose processor may be a microprocessor, or the processor 501 may be any conventional processor.

[0186] like Figure 6 As shown in the diagram, this embodiment of the invention also provides a schematic diagram of a computer-readable storage medium, on which a readable computer program 601 is stored. The computer program 601 can be stored in the storage medium in the form of a software product, including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in various embodiments of the invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, magnetic disks or optical disks, ROM (Read-Only Memory), RAM (Random Access Memory), or terminal devices such as computers, servers, mobile phones, and tablets.

[0187] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or modules may be electrical, mechanical, or other forms.

[0188] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0189] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium.

[0190] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0191] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that integrates one or more available media. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive (SSD)).

[0192] The technical solutions provided in this application have been described in detail above. Specific examples have been used in this application to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

[0193] Those skilled in the art will understand that embodiments of this application can be provided as methods, apparatus, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0194] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (devices), and computer program products according to this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the process. Figure 1 One or more processes and / or

[0195] or box Figure 1 A device that provides the functions specified in one or more boxes.

[0196] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0197] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0198] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. An administrative quad-level address extraction method, characterized by, include: Based on the trie, the requested address is extracted using the fourth level of administrative authority to obtain the fourth level of administrative authority information. The administrative level 4 information is segmented into words to obtain several administrative level 4 words and corresponding position indices, and the text precision type and administrative code corresponding to each administrative level 4 word are obtained. Based on the administrative code and the administrative level four word segmentation, several administrative level four addresses are generated, and the administrative level four addresses include several administrative level four word segments with hierarchical relationships. Based on the location index and the text precision type, the administrative level 4 addresses are filtered to extract the target administrative level 4 addresses; The step of filtering the administrative level 4 addresses based on the location index and the text precision type to extract the target administrative level 4 addresses includes: Based on the location index, invalid administrative level 4 words in the administrative level 4 address are identified and deleted to obtain a valid administrative level 4 address. The invalid administrative level 4 words are derived from the address portion of the request address that is not administrative level 4 but includes the name of administrative level 4. Based on the text precision type corresponding to each administrative level four word segment of the valid administrative level four address, and the preset basic weights corresponding to the name precision type, administrative level precision type and misspelling type in the text precision type, the basic weights of each administrative level are determined. Based on the administrative level to which each administrative level four word belongs and the basic weight of the effective administrative level four address, the basic score of each administrative level is determined, the level score of each administrative level is calculated, and the level scores are accumulated to obtain the final score of the effective administrative level four address. Based on the final score, the target administrative level four address is extracted.

2. The administrative quad-level address extraction method of claim 1, wherein, The step of determining invalid administrative level four word segments in the administrative level four address based on the location index includes: For each administrative level four word segment of the administrative level four address, the position index interval between the current administrative level four word segment and the superior administrative level four word segment is obtained according to the position index. If the position index interval is greater than a preset value, and there are preset company-related characters in the preset interval after the current administrative level four word segmentation, the current administrative level four word segmentation is determined to be an invalid administrative level four word segmentation. And / or, if the characters following the current administrative level 4 segment are preset non-administrative level 4 administrative address names, the current administrative level 4 segment is determined to be an invalid administrative level 4 segment.

3. The administrative quad-level address extraction method of claim 1, wherein, Based on the text precision type and administrative level corresponding to each administrative level four address, the steps for calculating the level score of each administrative level include: For the provincial level in the administrative hierarchy, the level score of the provincial level is calculated based on whether the corresponding fourth-level administrative segment is at the beginning of the text and the basic score mentioned above. For non-provincial levels within the administrative hierarchy, the level score for non-provincial levels is calculated based on the number of administrative level four segment words of the same level and the aforementioned basic score.

4. The administrative quad-level address extraction method of claim 3, wherein, The step of calculating the level score for non-provincial levels based on the number of administrative level four words at the same level and the base score includes: For each level except the province level, the level score of the current level is calculated as score = p q when the number of administrative four-level divisions at the current level is 1. q when the number of administrative four-level divisions at the current level is 1. When the number is greater than 1, the score = p + (p ((Num-M) / Num) to calculate the level score of the current level. Where score is the level score of the current level, p is the base score of the current level, q is the preset coefficient of the current level, Num is the preset constant coefficient, and M is the number of administrative level 4 words in the current level.

5. The administrative quad-level address extraction method of claim 1, wherein, After obtaining the final score of the valid administrative level four address, the method further includes: The final score is negatively weighted when any of the following conditions are met: The valid administrative level four address is missing an administrative level; The number of non-provincial-level administrative level four words in the valid administrative level four address is greater than a preset threshold; The valid administrative level 4 address includes preset easily confused characters.

6. The administrative quad-level address extraction method according to any one of claims 1 to 5, wherein, The step of generating several administrative level four addresses based on the administrative code and the administrative level four word segmentation includes: For each administrative level 4 word segment, the administrative level is obtained based on the administrative code, and it is determined whether there is a hierarchical relationship between the current administrative level 4 word segment and the previous administrative level 4 word segment. If it exists, based on the administrative level, update the current administrative level 4 word segment to the administrative level 4 address corresponding to the previous administrative level 4 word segment; If it does not exist, generate a new administrative level 4 address.

7. An administrative level four address extraction device, characterized in that, It includes an extraction module, a word segmentation module, a relationship module, and a filtering module; The extraction module is used to perform administrative level four extraction on the request address based on the trie to obtain administrative level four information; The word segmentation module is used to segment the administrative level 4 information to obtain several administrative level 4 words and corresponding position indices, and to obtain the text precision type and administrative code corresponding to each administrative level 4 word. The relationship module is used to generate several administrative level four addresses based on the administrative code and the administrative level four word segmentation. The administrative level four addresses include several administrative level four word segments with hierarchical relationships. The filtering module is used to filter the administrative level 4 addresses based on the location index and the text precision type, and extract the target administrative level 4 addresses; The step of filtering the administrative level 4 addresses based on the location index and the text precision type to extract the target administrative level 4 addresses includes: Based on the location index, invalid administrative level 4 words in the administrative level 4 address are identified and deleted to obtain a valid administrative level 4 address. The invalid administrative level 4 words are derived from the address portion of the request address that is not administrative level 4 but includes the name of administrative level 4. Based on the text precision type corresponding to each administrative level four word segment of the valid administrative level four address, and the preset basic weights corresponding to the name precision type, administrative level precision type and misspelling type in the text precision type, the basic weights of each administrative level are determined. Based on the administrative level to which each administrative level four word belongs and the basic weight of the effective administrative level four address, the basic score of each administrative level is determined, the level score of each administrative level is calculated, and the level scores are accumulated to obtain the final score of the effective administrative level four address. Based on the final score, the target administrative level four address is extracted.

8. An administrative level four address extraction device, characterized in that, Includes memory and processor, wherein: The memory is used to store computer programs; The processor is used to read the computer program in the memory and execute the steps of any of the administrative level four address extraction methods as described in claims 1 to 6.

9. A computer-readable storage medium, characterized in that, It stores a readable computer program that, when executed by a processor, implements the steps of any of the administrative level four address extraction methods as described in claims 1 to 6.