Directory generation method and apparatus, electronic device, and storage medium
By traversing the webpage content titles in top-down order and using mapping information to quickly determine the parent node of the directory node, a directory tree list is generated. This solves the problems of low directory generation efficiency and high memory consumption in existing technologies, and achieves efficient and refined directory generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2022-10-13
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, generating directories using specific text matching methods based on directory characteristics is inefficient and consumes a lot of memory, especially when the strings are long.
By traversing the content titles of the target webpage in top-down order, creating directory nodes and determining node levels, quickly identifying parent nodes using mapping information, generating a directory tree list and updating the mapping information, avoiding string matching, and directly adding directory nodes to generate the directory.
It improves the efficiency of directory generation, reduces device memory usage, and enhances the granularity and accuracy of directory generation.
Smart Images

Figure CN116992109B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a catalog generation method, apparatus, electronic device, and storage medium. Background Technology
[0002] A table of contents can be used to represent the structural features of text, satisfying readers' needs for quickly finding content and enabling skimming. In related technologies, a common approach is to automatically generate a table of contents by using specific text matching methods, such as matching common specific text like "Chapter X" or "Section X". However, this method requires first converting the text into a string, resulting in low table of contents generation efficiency. Furthermore, when the strings are long, this further increases the generation time and device memory usage. Summary of the Invention
[0003] The following is an overview of the subject matter described in detail in this application. This overview is not intended to limit the scope of the claims.
[0004] This application provides a directory generation method, apparatus, electronic device, and storage medium, which can improve the efficiency of directory generation and reduce the memory usage of the device.
[0005] On the one hand, embodiments of this application provide a catalog generation method, including:
[0006] Traverse multiple content titles of the target webpage in order from top to bottom, create target directory nodes based on the content titles in the current traversal round, and determine the target node level of the target directory nodes;
[0007] Obtain the target mapping information in the current traversal round, wherein the target mapping information is used to indicate the mapping relationship between the node level and the parent node;
[0008] The target parent node corresponding to the target directory node is determined from the target mapping information based on the target node level;
[0009] Obtain the directory tree list of the target webpage, add the target directory node to the directory tree list according to the determination result of the target parent node, and update the target mapping information according to the target directory node;
[0010] The target directory of the target webpage is generated based on the directory tree list after adding the target directory node.
[0011] On the other hand, embodiments of this application also provide a catalog generation apparatus, including:
[0012] The node creation module is used to traverse multiple content titles of the target webpage in order from top to bottom, create target directory nodes based on the content titles in the current traversal round, and determine the target node level of the target directory nodes.
[0013] The mapping information acquisition module is used to acquire the target mapping information in the current traversal round, wherein the target mapping information is used to indicate the mapping relationship between the node level and the parent node;
[0014] The matching module is used to determine the target parent node corresponding to the target directory node from the target mapping information based on the target node level;
[0015] An add module is used to obtain a directory tree list of the target webpage, add the target directory node to the directory tree list according to the determination result of the target parent node, and update the target mapping information according to the target directory node;
[0016] The directory generation module is used to generate the target directory of the target webpage based on the directory tree list after adding the target directory node.
[0017] Furthermore, the aforementioned added module is specifically used for:
[0018] The write position of the target directory node in the directory tree list is obtained based on the determination result of the target parent node;
[0019] The target directory node is added to the directory tree list according to the write location.
[0020] Furthermore, the aforementioned added module is specifically used for:
[0021] When the result of determining the target parent node is that the target directory node does not have a target parent node, the writing position of the target directory node in the directory tree list is determined as the end position of the directory tree list;
[0022] Alternatively, if the target parent node is determined to exist in the target directory node, a set of subdirectories corresponding to the target parent node is created in the directory tree list, and the writing position of the target directory node in the directory tree list is determined as the end position in the set of subdirectories.
[0023] Furthermore, the aforementioned matching module is specifically used for:
[0024] The target node level is matched with the node level in the target mapping information, and the target parent node corresponding to the target directory node is determined from the parent nodes in the target mapping information based on the matching result.
[0025] Alternatively, the target node level can be reduced and matched with the node level in the target mapping information. Based on the matching result, the target parent node corresponding to the target directory node can be determined from the parent nodes in the target mapping information.
[0026] Furthermore, the aforementioned added module is specifically used for:
[0027] The node to be replaced corresponding to the target directory node is determined from the target mapping information based on the target node level;
[0028] The node to be replaced is updated to the target directory node to obtain the updated target mapping information.
[0029] Furthermore, the aforementioned added module is specifically used for:
[0030] The target node level is matched with the node level in the target mapping information, and the node to be replaced corresponding to the target directory node is determined from the parent nodes in the target mapping information based on the matching result.
[0031] Alternatively, the target node level can be raised and matched with the node level in the target mapping information. Based on the matching result, the node to be replaced corresponding to the target directory node can be determined from the parent nodes in the target mapping information.
[0032] Furthermore, the aforementioned mapping information acquisition module is specifically used for:
[0033] If the target node level in the previous traversal round is level one, or if the target node level in the previous traversal round is the same as the target node level in the first traversal round, obtain the directory tree list and add the target directory node from the previous traversal round to the end of the directory tree list.
[0034] Add the mapping relationship between the target directory node and the target node level in the previous traversal round to the target mapping information in the previous traversal round to obtain the target mapping information in the current traversal round;
[0035] If the target node level in the current traversal round is level two or above, or if the target node level in the current traversal round is different from the target node level in the first traversal round, obtain the target mapping information in the current traversal round.
[0036] Furthermore, the aforementioned node creation module is specifically used for:
[0037] Obtain the webpage code text of the target webpage;
[0038] Identify multiple content titles in the target webpage based on element tags in the webpage code text, and determine the title level of each content title based on the element tags;
[0039] Based on the top-to-bottom position order in the target webpage, a title list is constructed according to multiple content titles and the title level corresponding to the content titles, and the title list is traversed.
[0040] Furthermore, the aforementioned node creation module is specifically used for:
[0041] The node name is determined based on the content title in the current traversal round, and the node level is determined based on the title level of the content title in the current traversal round.
[0042] Create the target directory node based on the node name and the node level.
[0043] Furthermore, the aforementioned directory generation module is specifically used for:
[0044] The directory tree list after adding the target directory node is parsed to obtain multiple directory node links, wherein the directory node links include multiple target directory nodes, and the first directory node of any directory node link is the target directory node with the highest target node level in the directory node link;
[0045] The target directory of the target webpage is generated based on the multiple directory node links.
[0046] Furthermore, the aforementioned directory generation module is specifically used for:
[0047] The directory tree list after adding the target directory node is parsed to obtain multiple directory node links;
[0048] When the first directory node is the directory node with the lowest target node level in the directory node link it belongs to, the second directory node is the first directory node in the directory node link it belongs to, and the target node level of the first directory node is at least two levels higher than the target node level of the second directory node, the directory node link where the first directory node is located and the directory node link where the second directory node is located are merged to obtain a merged node link.
[0049] The target directory of the target webpage is generated based on the merged node links and the remaining directory node links.
[0050] Furthermore, the aforementioned directory generation module is specifically used for:
[0051] The directory tree list after adding the target directory node is parsed to obtain multiple directory node links;
[0052] When a new directory node is inserted into the directory node link, and the new directory node has the same target node level as the first target directory node in the directory node link, the directory node link is split into two sub-node links based on the new directory node. The first target directory node in the directory node link and the new directory node are respectively the first directory node of the two sub-node links.
[0053] The target directory of the target webpage is generated based on the child node links and the remaining directory node links.
[0054] On the other hand, embodiments of this application also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the above-described catalog generation method.
[0055] On the other hand, embodiments of this application also provide a computer-readable storage medium storing a computer program that is executed by a processor to implement the above-described directory generation method.
[0056] On the other hand, embodiments of this application also provide a computer program product, which includes a computer program stored in a computer-readable storage medium. A processor of a computer device reads the computer program from the computer-readable storage medium and executes the computer program, causing the computer device to perform the directory generation method described above.
[0057] The embodiments of this application include at least the following beneficial effects: By traversing multiple content titles of the target webpage in a top-to-bottom order, creating target directory nodes based on the content titles in the current traversal round, determining the target node level of the target directory nodes, and then obtaining the target mapping information in the current traversal round, since the target mapping information is used to indicate the mapping relationship between the node level and the parent node, the target parent node corresponding to the target directory node can be quickly determined from the target mapping information based on the target node level, thereby obtaining the directory tree list of the target webpage. Based on the determination result of the target parent node, the target directory node is added to the directory tree list, and the target directory of the target webpage is generated based on the directory tree list after adding the target directory node. In this process, there is no need to perform string matching, thereby effectively improving the efficiency of directory generation and reducing the memory usage of the device; Furthermore, by traversing multiple content titles of the target webpage in a top-to-bottom order, and then determining the target parent node of the target directory node corresponding to each content title, and updating the target mapping information according to the target directory node in each traversal round, the hierarchical relationship between different target directory nodes can be quickly determined when generating the target directory, thereby improving the granularity of directory generation and maintaining the accuracy of directory generation.
[0058] Other features and advantages of this application will be set forth in the following description and will be apparent in part from the description or may be learned by practicing the application. Attached Figure Description
[0059] The accompanying drawings are used to provide a further understanding of the technical solutions of this application and constitute a part of the specification. They are used together with the embodiments of this application to explain the technical solutions of this application and do not constitute a limitation on the technical solutions of this application.
[0060] Figure 1 A schematic diagram illustrating an optional implementation environment provided for an embodiment of this application;
[0061] Figure 2 A flowchart illustrating an optional step of the catalog generation method provided in an embodiment of this application;
[0062] Figure 3 An optional structural diagram of the title list provided in an embodiment of this application;
[0063] Figure 4 This is an optional structural diagram of the target mapping information provided in the embodiments of this application;
[0064] Figure 5 This is a schematic diagram of an optional process for adding a target directory node to a directory tree list, provided as an embodiment of this application.
[0065] Figure 6 A schematic diagram of an optional process for updating target mapping information provided in an embodiment of this application;
[0066] Figure 7 This application provides an optional complete flowchart of adding a target directory node to a directory tree list, as shown in the embodiments of this application.
[0067] Figure 8 This is a schematic diagram of an optional process for adding a target directory node to a directory tree list, provided as an embodiment of this application.
[0068] Figure 9 This is a schematic diagram of an optional structure of a target directory generated from a directory tree list, provided in an embodiment of this application.
[0069] Figure 10 This is a schematic diagram illustrating another optional structure of the target directory generated from a directory tree list, as provided in an embodiment of this application.
[0070] Figure 11 This is a schematic diagram illustrating another optional structure of the target directory generated from a directory tree list, as provided in an embodiment of this application.
[0071] Figure 12 This is a schematic diagram illustrating another optional structure of the target directory generated from a directory tree list, as provided in an embodiment of this application.
[0072] Figure 13 A schematic diagram of an optional structure of the catalog generation apparatus provided in the embodiments of this application;
[0073] Figure 14 This is a partial structural block diagram of a terminal provided in an embodiment of this application;
[0074] Figure 15 A partial structural block diagram of the server provided in an embodiment of this application. Detailed Implementation
[0075] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0076] It should be noted that in various specific embodiments of this application, when processing data related to the characteristics of the target object, such as target object attribute information or attribute information sets, is required, the permission or consent of the target object will be obtained first. Furthermore, the collection, use, and processing of this data will comply with the relevant laws, regulations, and standards of the relevant countries and regions. The target object may be a user. In addition, when embodiments of this application need to obtain target object attribute information, separate permission or consent from the target object will be obtained through pop-ups or redirection to a confirmation page. Only after obtaining the target object's separate permission or consent will the necessary target object-related data for the normal operation of the embodiments of this application be obtained.
[0077] To facilitate understanding of the technical solutions provided in the embodiments of this application, some key terms used in the embodiments of this application will be explained below:
[0078] Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. Based on the cloud computing business model, cloud technology encompasses network technology, information technology, integration technology, management platform technology, and application technology. It can form resource pools, providing flexible and convenient on-demand access. Cloud computing technology will become a crucial support. Backend services of technical network systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring transmission to backend systems for logical processing. Data at different levels will be processed separately, and various industry data will require robust system support, which can only be achieved through cloud computing.
[0079] The solutions provided in this application relate to cloud computing, a cloud technology. Cloud computing refers to the delivery and usage model of IT infrastructure, meaning obtaining necessary resources through a network in an on-demand and easily scalable manner. In a broader sense, cloud computing refers to the delivery and usage model of services, meaning obtaining necessary services through a network in an on-demand and easily scalable manner. These services can be IT and software-related, internet-related, or other services. Cloud computing is a product of the development and integration of traditional computer and network technologies such as grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and load balancing. Driven by the development of the internet, real-time data streams, the diversification of connected devices, and the demands of search services, social networks, mobile commerce, and open collaboration, cloud computing has developed rapidly. Unlike previous parallel and distributed computing, the emergence of cloud computing will, conceptually, drive a revolutionary change in the entire internet model and enterprise management model.
[0080] Currently, in related technologies, a common approach to automatically generate a table of contents is to use specific text matching based on the characteristics of the table of contents, such as matching common specific text like "Chapter X" or "Section X". However, this method requires converting the text into strings first, resulting in low table of contents generation efficiency. Furthermore, when the strings are long, it further increases the time required to generate the table of contents and increases the device's memory usage.
[0081] Based on this, embodiments of this application provide a directory generation method, apparatus, electronic device, and storage medium, which can improve the efficiency of directory generation and reduce the memory usage of the device.
[0082] Reference Figure 1 , Figure 1 This is a schematic diagram of an optional implementation environment provided in an embodiment of this application. The implementation environment includes a terminal 101 and a server 102, wherein the terminal 101 and the server 102 are connected through a communication network.
[0083] For example, server 102 can traverse multiple content titles of the target webpage in top-to-bottom order, create target directory nodes based on the content titles in the current traversal round, and determine the target node level of the target directory nodes; obtain target mapping information in the current traversal round, determine the target parent node corresponding to the target directory node from the target mapping information based on the target node level; obtain the directory tree list of the target webpage, add the target directory nodes to the directory tree list based on the determination result of the target parent nodes, update the target mapping information based on the target directory nodes; generate the target directory of the target webpage based on the directory tree list after adding the target directory nodes, and then send the target directory to terminal 101.
[0084] In addition, the implementation environment of the directory generation method provided in this application embodiment may also include only terminal 101. Terminal 101 directly traverses multiple content titles of the target webpage in the top-to-bottom order, creates target directory nodes based on the content titles in the current traversal round, and determines the target node level of the target directory nodes; obtains target mapping information in the current traversal round, determines the target parent node corresponding to the target directory node from the target mapping information based on the target node level; obtains the directory tree list of the target webpage, adds the target directory nodes to the directory tree list based on the determination result of the target parent nodes, updates the target mapping information based on the target directory nodes; and generates the target directory of the target webpage based on the directory tree list after adding the target directory nodes.
[0085] As can be seen, terminal 101 or server 102 traverses multiple content titles of the target webpage in top-to-bottom order, creates target directory nodes based on the content titles in the current traversal, determines the target node level of the target directory nodes, and then obtains the target mapping information in the current traversal. Since the target mapping information is used to indicate the mapping relationship between the node level and the parent node, the target parent node corresponding to the target directory node can be quickly determined from the target mapping information based on the target node level, thereby obtaining the directory tree list of the target webpage. Based on the determination result of the target parent node, the target directory node is added to the target directory tree. In the directory tree list, the target directory of the target webpage is generated based on the directory tree list after adding the target directory node. This process does not require string matching, which can effectively improve the efficiency of directory generation and reduce the memory usage of the device. Furthermore, by traversing multiple content titles of the target webpage in top-down order, the target parent node of the target directory node corresponding to each content title is determined. At the same time, the target mapping information is updated according to the target directory node in each traversal round. When generating the target directory, the hierarchical relationship between different target directory nodes can be quickly determined, thereby improving the granularity of directory generation and maintaining the accuracy of directory generation.
[0086] Server 102 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. Additionally, server 102 can also be a node server in a blockchain network.
[0087] Terminal 101 can be a smartphone, tablet computer, laptop computer, desktop computer, smart speaker, smartwatch, vehicle terminal, etc., but is not limited to these. Terminal 101 and server 102 can be directly or indirectly connected via wired or wireless communication, and this embodiment of the application does not impose any limitations.
[0088] The methods provided in this application can be applied to various technical fields, including but not limited to cloud technology.
[0089] Reference Figure 2 , Figure 2 This is an optional flowchart of a directory generation method provided in an embodiment of this application. The directory generation method can be executed by a terminal, a server, or by a combination of both. The directory generation method includes, but is not limited to, the following steps 201 to 205.
[0090] Step 201: Traverse the multiple content titles of the target webpage in order from top to bottom, create a target directory node based on the content title in the current traversal round, and determine the target node level of the target directory node.
[0091] In one possible implementation, the target webpage can be the webpage currently being viewed by the terminal or a webpage loaded in the background; the target webpage can be an information webpage such as a news or forum, or it can be a document webpage that allows online viewing or editing of documents, and this application embodiment does not limit the scope of the target webpage.
[0092] In one possible implementation, the target webpage may include multiple content titles. Each content title is a statement indicating the main content or central idea of a paragraph or section within the target webpage. The heading levels of these content titles may differ. The heading level indicates the scope of the content title within the webpage; a higher heading level indicates a broader scope. For example, multiple content titles can be categorized into first-level headings, second-level headings, third-level headings, and so on, from highest to lowest heading level. Correspondingly, the scope of a first-level heading is greater than that of a second-level heading, which is greater than that of a third-level heading, and so on. Of course, the specific division of heading levels can be determined according to actual needs, and this embodiment does not impose any limitations.
[0093] Generally, when browsing a target webpage, the browsing is from top to bottom. Therefore, multiple content titles on the target webpage are arranged in order from top to bottom. For example, if multiple content titles include content title T1, content title T2, and content title T3, they are arranged in order from top to bottom as content title T1, content title T2, and content title T3. Therefore, the traversal order of multiple content titles is content title T1, content title T2, and content title T3.
[0094] In one possible implementation, a target directory node is created for each content title, and the target node level can be the same as the title level. For example, if the content title in the current traversal is a level 1 title, then the corresponding target node level is a level 1 directory node. A target directory node with a higher directory node level can serve as the parent node of a target directory node with a lower directory node level. For example, a level 1 directory node can serve as the parent node of a level 2 directory node, or a level 1 directory node can serve as the parent node of a level 3 directory node, or a level 2 directory node can serve as the parent node of a level 3 directory node, and so on. Furthermore, there can be multiple target directory nodes at the same directory node level, and each target directory node at the same directory node level can serve as the parent node of target directory nodes at different lower directory node levels. For example, two different level 1 directory nodes can each serve as the parent node of different level 2 directory nodes.
[0095] The target directory node is the directory node corresponding to the currently traversed content title among the multiple content titles of the target webpage. For example, when the multiple content titles include content title T1, content title T2, and content title T3, the target directory nodes are the directory nodes corresponding to content title T1, content title T2, and content title T3, respectively. Then, the target node level of the directory nodes corresponding to content title T1, content title T2, and content title T3 is determined respectively.
[0096] In one possible implementation, when traversing multiple content titles of the target webpage in order of their top-to-bottom positions, the webpage code text of the target webpage can be obtained; multiple content titles in the target webpage can be identified based on element tags in the webpage code text, and the title level of each content title can be determined based on the element tags; based on their top-to-bottom positions in the target webpage, a title list can be constructed according to the multiple content titles and their corresponding title levels, and the title list can be traversed.
[0097] The webpage code text is used by web browsers to parse and display the corresponding target webpage. This code text includes various element tags, which indicate the type of webpage code. For example, element tags can be h1, h2, div, etc. The h1 and h2 elements, among others, indicate that the corresponding webpage code represents a content title. Therefore, multiple content titles on the target webpage can be identified based on the element tags in the webpage code text. Specifically, the h1 element indicates a first-level heading, the h2 element indicates a second-level heading, and so on.
[0098] The heading list includes the heading name and heading level of each content heading, for example, refer to Figure 3 , Figure 3 This is a schematic diagram of an optional structure for the title list provided in an embodiment of this application. The data structure of the title list can be an array, the data type of the title name can be a string, and the data type of the title level can be a number. The title list includes title T1, title T2, ..., title Tn, and the title levels corresponding to title T1, title T2, ..., title Tn are level L1, level L2, ..., level Ln, respectively. Therefore, when traversing the title list, the title level of the currently traversed content title can be quickly determined.
[0099] In addition, the title list can also include other attributes of each content title, such as font and the position of the content title on the target webpage, thereby increasing the amount of information carried by the title list.
[0100] As can be seen, by obtaining the webpage code text of the target webpage, we can identify multiple content titles on the target webpage and determine the title level of each content title based on the element tags in the webpage code text. This allows us to construct a title list, and when traversing the title list, we can quickly determine the title level of each content title, effectively improving processing efficiency.
[0101] Based on this, when creating target directory nodes according to the content titles in the current traversal round, the node name can be determined according to the content titles in the current traversal round, and the node level can be determined according to the title level of the content titles in the current traversal round; the target directory node is created according to the node name and node level.
[0102] For example, if the title of the content in the current traversal round is "Work Summary" and the title level is level one, then the node name of the target directory node will also be "Work Summary" and the node level will also be level one. This allows for quick conversion of title names and title levels into node names and node levels.
[0103] Understandably, when determining node names based on title names, the title names can be summarized to obtain the node names. In addition, directory nodes can include other attributes of each directory node, such as font and directory generation location, thereby increasing the amount of information carried by the directory tree list.
[0104] Step 202: Obtain the target mapping information in the current traversal round.
[0105] In each traversal iteration, target mapping information is acquired, which indicates the mapping relationship between node levels and parent nodes. Each node level has no more than one instance in the target mapping information, and correspondingly, each node level has no more than one parent node. In each traversal iteration, the target mapping information is updated based on the target directory node being traversed, thus determining the target parent node corresponding to the target directory node in each traversal iteration.
[0106] For example, refer to Figure 4 , Figure 4 This is a schematic diagram of an optional structure for the target mapping information provided in an embodiment of this application. Similarly, the data structure of the target mapping information can be an array, the data type of the node level can be numbers, and the data type of the parent node can be strings. Figure 4 In the example shown, when the node level is "2", the corresponding parent node is title T1; when the node level is "3", the corresponding parent node is title T2; and when the node level is "n-1", the corresponding parent node is title Tn.
[0107] Step 203: Determine the target parent node corresponding to the target directory node from the target mapping information based on the target node level.
[0108] Since the target node level of the target directory node in the current traversal round is determined in step 201, the target parent node corresponding to the target directory node can be determined from the parent nodes in the target mapping information by matching the target node level with the node level in the target mapping information.
[0109] In one possible implementation, when determining the target parent node corresponding to the target directory node from the target mapping information based on the target node level, the target node level can be directly matched with the node level in the target mapping information, and the target parent node corresponding to the target directory node can be determined from the parent nodes in the target mapping information based on the matching result.
[0110] For example, to undertake Figure 4 In the example shown, if the target node level of the target directory node in the current traversal round is level two, then the corresponding target parent node can be determined to be title T1; if the target node level of the target directory node in the current traversal round is level three, then the corresponding target parent node can be determined to be title T2, and so on. In this case, the minimum node level in the target mapping information is level two.
[0111] In addition, when determining the target parent node corresponding to the target directory node from the target mapping information based on the target node level, it is also possible to lower the target node level and match it with the node level in the target mapping information, and then determine the target parent node corresponding to the target directory node from the parent nodes in the target mapping information based on the matching result.
[0112] The reduction in the target node level can be one level, in which case the minimum node level in the target mapping information can be level one. For example, when the node level in the target mapping information is "1", the corresponding parent node is title T1; when the node level is "2", the corresponding parent node is title T2; when the node level is "n", the corresponding parent node is title Tn. If the target node level of the currently traversed target directory node is level two, then the corresponding target parent node can be determined to be title T1; if the target node level is level three, then the corresponding target parent node can be determined to be title T2, and so on.
[0113] As can be seen, since the target mapping information indicates the mapping relationship between the node level and the parent node, the target parent node corresponding to the target directory node can be quickly determined, effectively improving processing efficiency. Furthermore, the target parent node corresponding to the target directory node can be determined in two different ways, enhancing the flexibility in determining the target parent node.
[0114] Step 204: Obtain the directory tree list of the target webpage, add the target directory node to the directory tree list according to the determination result of the target parent node, and update the target mapping information according to the target directory node.
[0115] In one possible implementation, the subsequently generated target directory consists of multiple directory nodes. The directory tree list is used to store the node information of the directory nodes. The node information can be node level, node name, parent node, child node, etc. The node level can be first-level directory, second-level directory, etc.
[0116] Specifically, the target directory node is added to the directory tree list based on the determination of its parent node. This can be achieved by determining the write position of the target directory node in the directory tree list based on the parent node's determination, and then adding the target directory node to the list according to that write position. The parent node's determination indicates whether the target directory node has a parent node. If a node level matching the target node level exists in the target mapping information, then the corresponding parent node for the target directory node can be determined, meaning the target directory node has a corresponding parent node, and thus, the write position of the target directory node in the directory tree list can be determined.
[0117] In one possible implementation, the writing position of the target directory node in the directory tree list is used to indicate the hierarchical relationship between the target directory node and the other directory nodes in the directory tree list. When the result of determining the target parent node is that the target directory node does not have a target parent node, the writing position of the target directory node in the directory tree list can be determined as the end position of the directory tree list. At this time, since the target directory node does not have a target parent node, the directory node corresponding to the target directory node has a parallel hierarchical relationship with the other directory nodes in the directory tree list. Therefore, the target directory node is directly written to the end position of the directory tree list, so as not to affect the hierarchical relationship of the original directory nodes in the directory tree list.
[0118] When the target parent node is determined to be a target directory node, a subdirectory set corresponding to the target parent node can be created in the directory tree list. The writing position of the target directory node in the directory tree list is determined as the end position in the subdirectory set. At this time, since the target directory node has a target parent node, the directory node corresponding to the target directory node should be a child node of the directory node corresponding to the target parent node. Therefore, the target directory node is written to the end position of the subdirectory set corresponding to the target parent node, thereby constructing a hierarchical relationship between the directory node corresponding to the target directory node and the directory node corresponding to the target parent node in the directory tree list. The subdirectory set is used to store one or more child nodes under a certain directory node, and the data type of the subdirectory set can be an array.
[0119] For example, refer to Figure 5 , Figure 5 This is an optional flowchart illustrating the process of adding a target directory node to a directory tree list, as provided in an embodiment of this application. If the directory tree list already stores the directory node corresponding to title T1, the directory node corresponding to title t11, and the directory node corresponding to title t12, where the directory node corresponding to title T1 is the parent node of both the directory nodes corresponding to title t11 and title t12, and the directory nodes corresponding to title t11 and title t12 are stored in the subdirectory set of the directory node corresponding to title T1, and the currently traversed target directory node is title T2, if it is determined from the target mapping information that title T2 does not have a target parent node, then the directory node corresponding to title T2 is added to the end of the directory tree node. If it is determined from the target mapping information that title T2 has a target parent node, and the target parent node of title T2 is title T1, then the directory node corresponding to title T2 is added to the end of the subdirectory set of the directory node corresponding to title T1.
[0120] In step 201, the target parent node of each content title is determined by traversing multiple content titles of the target webpage in a top-down order. At the same time, the target mapping information is updated according to the target directory node in each traversal round. In the next traversal round, the target parent node can be matched using the target mapping information updated in the current traversal round. This allows for the rapid determination of the hierarchical relationship between different target directory nodes when generating the target directory, thereby improving the granularity of directory generation and maintaining the accuracy of directory generation.
[0121] Step 205: Generate the target directory of the target webpage based on the directory tree list after adding the target directory node.
[0122] One possible implementation is to generate the target directory of the target webpage based on the directory tree list after traversing multiple content titles and adding each target directory node to the directory tree list. Alternatively, in each traversal iteration, after adding the target directory nodes of the current traversal iteration to the directory tree list, the target directory of the target webpage can be updated once based on the directory tree list.
[0123] When generating the target directory from the directory tree list, the directory nodes in the list can be traversed in a preset order, generating the target directory based on the node name and node level. This preset order can be positional order; if the current directory node includes a set of subdirectories, the data in the subdirectories of that directory node is processed first, then the next directory node is processed. Alternatively, the preset order can be node-level order, meaning all first-level directories in the directory tree list can be generated first, then the second-level directories in the subdirectory sets of each first-level directory can be generated, then the third-level directories in the subdirectory sets of each second-level directory can be generated, and so on.
[0124] By traversing multiple content titles of the target webpage in top-to-bottom order, target directory nodes are created based on the content titles in the current traversal round, and the target node level of the target directory nodes is determined. Then, the target mapping information in the current traversal round is obtained. Since the target mapping information is used to indicate the mapping relationship between the node level and the parent node, the target parent node corresponding to the target directory node can be quickly determined from the target mapping information based on the target node level. Then, the directory tree list of the target webpage is obtained. Based on the determination of the target parent node, the target directory node is added to the directory tree list. The target directory of the target webpage is generated based on the directory tree list after adding the target directory node. In this process, there is no need to perform string matching, which can effectively improve the efficiency of directory generation and reduce the memory usage of the device.
[0125] In one possible implementation, when updating the target mapping information, the node to be replaced corresponding to the target directory node can be determined from the target mapping information based on the target node level, and the node to be replaced can be updated to the target directory node to obtain the updated target mapping information.
[0126] There are two different ways to update the node to be replaced to the target directory node, which can improve the flexibility of updating the node to be replaced. Furthermore, when determining the target parent node corresponding to the target directory node from the parent nodes in the target mapping information, the matching method between the target node level and the node level in the target mapping information is different from that when determining the node to be replaced corresponding to the target directory node from the parent nodes in the target mapping information.
[0127] Specifically, if when determining the target parent node corresponding to the target directory node from the target mapping information based on the target node level, the target node level is matched with the node level in the target mapping information, and the target parent node corresponding to the target directory node is determined from the parent nodes in the target mapping information based on the matching result, then when determining the node to be replaced corresponding to the target directory node from the target mapping information based on the target node level, the target node level can be increased and matched with the node level in the target mapping information, and the node to be replaced corresponding to the target directory node can be determined from the parent nodes in the target mapping information based on the matching result. Similarly, the increase in the target node level can also be one level.
[0128] Furthermore, if when determining the target parent node corresponding to the target directory node from the target mapping information based on the target node level, the target node level is reduced and matched with the node level in the target mapping information, and the target parent node corresponding to the target directory node is determined from the parent nodes in the target mapping information based on the matching result, then when determining the node to be replaced corresponding to the target directory node from the target mapping information based on the target node level, the target node level can be matched with the node level in the target mapping information, and the node to be replaced corresponding to the target directory node can be determined from the parent nodes in the target mapping information based on the matching result.
[0129] For example, refer to Figure 6 , Figure 6 This is an optional flowchart illustrating the updating of target mapping information provided in an embodiment of this application. In the current traversal round, the target mapping information includes three mapping relationships: "2-Title T1", "3-Title T2", and "4-Title T3". The target directory node in the current traversal round is Title T4, and the target node level of Title T4 is level three. Therefore, the target parent node corresponding to Title T4 can be directly matched as Title T2. After adding Title T4 to the directory tree list, the target node level of Title T4 is increased by one level, and the node to be replaced is matched as Title T3. Title T3 is replaced with Title T4, resulting in the updated target mapping information including the mapping relationships "2-Title T1", "3-Title T2", and "4-Title T4".
[0130] For example, refer to again Figure 6In the current traversal round, the target mapping information includes three mapping relationships: "1-Title T1", "2-Title T2", and "3-Title T3". The target directory node in the current traversal round is Title T4, and the target node level of Title T4 is level three. Therefore, the target node level of Title T4 can be reduced by one level, and the target parent node corresponding to Title T4 can be matched as Title T2. After adding Title T4 to the directory tree list, the node to be replaced can be directly matched as Title T3. Title T3 is replaced with Title T4, and the updated target mapping information includes the mapping relationships "1-Title T1", "2-Title T2", and "3-Title T4".
[0131] In one possible implementation, when the target directory node corresponding to the node to be replaced cannot be determined from the parent node in the target mapping information, the mapping relationship between the target node level of the target directory node and the target directory node can be added to the target mapping information to update the target mapping information; similarly, when the target directory node corresponding to the node to be replaced cannot be determined from the parent node in the target mapping information, the mapping relationship between the target node level of the target directory node and the target directory node can be added to the target mapping information to update the target mapping information.
[0132] In one possible implementation, when obtaining the target mapping information in the current traversal round, it can first be determined whether the target node level in the previous traversal round was level one, or whether the target node level in the previous traversal round was the same as the target node level in the first traversal round. If the target node level in the previous traversal round was level one, or the target node level in the previous traversal round was the same as the target node level in the first traversal round, the directory tree list is obtained, and the target directory node from the previous traversal round is added to the end of the directory tree list.
[0133] Among them, if the target node level in the previous traversal round is level one, or the target node level in the previous traversal round is the same as the target node level in the first traversal round, there are two possible cases:
[0134] One scenario is that the previous traversal iteration is the first traversal iteration, and the target webpage contains a first-level heading. In this case, the directory tree list is empty, and the target directory node from the previous traversal iteration can be directly added to the end of the directory tree list. At the same time, the target mapping information from the previous traversal iteration is also empty, and the mapping relationship between the target directory node and the target node level from the previous traversal iteration can be directly added to the target mapping information from the previous traversal iteration.
[0135] Another scenario is that the target webpage contains multiple content titles with the same title level as the first content title on the target webpage. This means there are multiple target directory nodes with the same title level as the target directory node in the first traversal iteration. In this case, there were multiple traversal iterations before the previous one, and the directory tree list is not empty. Since the target directory nodes with the same title level as the target directory node in the first traversal iteration do not have parent nodes, the target directory node from the previous traversal iteration can be directly added to the end of the directory tree list. Correspondingly, the target mapping information from the previous traversal iteration is also not empty, because in the previous traversal iteration... The target node level is the same as the target directory node level in the first traversal round, indicating that the target directory node that came before the previous traversal round has been added to the directory tree list. Therefore, in addition to using the aforementioned method of matching the target node level with the node level in the target mapping information to update the target mapping information, we can also directly clear the target mapping information of the previous traversal round, and then add the mapping relationship between the target directory node and the target node level in the previous traversal round to the target mapping information in the previous traversal round. This can achieve the effect of cleaning up the target mapping information and reduce data caching.
[0136] Based on this, if the target node level in the current traversal round is level two or higher, or if the target node level in the current traversal round is different from the target node level in the first traversal round, then the target mapping information in the current traversal round is obtained, which is the target mapping information updated in the previous traversal round. In short, determining the subsequent target parent node only when the target directory node is a level two or higher directory node or differs from the target directory node in the first traversal round reduces the complexity of data processing and improves the efficiency of directory generation.
[0137] The following example illustrates the principle of adding a target directory node to the directory tree list.
[0138] Reference Figure 7 , Figure 7 The following is an optional complete flowchart of adding a target directory node to a directory tree list, provided as an embodiment of this application. Specifically, adding the target directory node to the directory tree list may include the following steps 701 to 709.
[0139] Step 701: Traverse the title list;
[0140] Step 702: Determine if the title level of the content title is equal to 1. If yes, proceed to step 703; otherwise, proceed to step 704.
[0141] Step 703: Create a directory node corresponding to the content title, add the mapping relationship between the directory node and the node level of the directory node to the mapping information, add the directory node to the directory tree object, and jump to step 708;
[0142] Step 704: Create the directory node corresponding to the content title, obtain the mapping information, and match the parent node of the directory node according to the node level of the directory node;
[0143] Step 705: Determine if the directory node has a parent node. If yes, proceed to step 706; otherwise, proceed to step 707.
[0144] Step 706: Add the directory node to the set of subdirectories of the parent node, update the mapping information, and proceed to step 708;
[0145] Step 707: Add the directory node to the directory tree list and update the mapping information;
[0146] Step 708: Determine if the title list has been traversed. If yes, proceed to step 709; otherwise, proceed to step 701.
[0147] Step 709: Obtain the directory tree list after adding directory nodes.
[0148] In this example, the order of the content titles in the title list is based on their top-to-bottom position on the target webpage. The directory tree list obtained in step 709 can be used to generate the target directory of the target webpage. Steps 701 to 709 do not require string matching, thus effectively improving the efficiency of directory generation and reducing device memory usage. Furthermore, when generating the target directory, the hierarchical relationship between different target directory nodes can be quickly determined, thereby improving the granularity of directory generation and maintaining its accuracy.
[0149] Based on this, refer to Figure 8 , Figure 8 This is a schematic diagram illustrating an optional process for adding a target directory node to a directory tree list, as provided in an embodiment of this application. The following is in conjunction with... Figure 8 Further explanation Figure 7 The principle behind the illustrated process steps. In this example, the heading list includes heading T1, heading T2, heading T3, and heading T4 in sequence, with heading levels of level 1, level 2, level 2, and level 1 respectively. In the initial state, the mapping information is empty, and the directory tree list is empty.
[0150] In the first traversal round, a directory node corresponding to title T1 is created. The node name of the directory node corresponding to title T1 is the title name of title T1, and the node level is level one. Since the directory node corresponding to title T1 is the first directory node, the directory node corresponding to title T1 is directly added to the directory tree list. After increasing the node level of the directory node corresponding to title T1 by one level, the mapping relationship between the directory node and the directory node corresponding to title T1 is generated and added to the mapping information.
[0151] In the second traversal round, a directory node corresponding to title T2 is created. The node name of the directory node corresponding to title T2 is the title name of title T2, and the node level is two levels. Based on the node level of the directory node corresponding to title T2, the parent node of the directory node corresponding to title T2 is matched from the mapping information to be the directory node corresponding to title T1. Therefore, the directory node corresponding to title T2 is added to the subdirectory set of the directory node corresponding to title T1 in the directory tree list. After increasing the node level of the directory node corresponding to title T2 by one level, the mapping relationship between the directory node corresponding to title T2 and the directory node corresponding to title T2 is generated and added to the mapping information.
[0152] In the third traversal round, a directory node corresponding to title T3 is created. The node name of the directory node corresponding to title T3 is the title name of title T3, and the node level is two levels. Based on the node level of the directory node corresponding to title T3, the parent node of the directory node corresponding to title T3 is matched from the mapping information and is the directory node corresponding to title T1. Therefore, the directory node corresponding to title T3 is added to the subdirectory set of the directory node corresponding to title T1 in the directory tree list. After increasing the node level of the directory node corresponding to title T3 by one level, the node to be updated in the mapping relationship is matched and is the directory node corresponding to title T2. The directory node corresponding to title T2 is replaced with the directory node corresponding to title T3.
[0153] In the fourth traversal round, a directory node corresponding to title T4 is created. The node name of the directory node corresponding to title T4 is the title name of title T4, and the node level is level one. Since the directory node corresponding to title T4 has the same node level as the directory node corresponding to title T1, the directory node corresponding to title T4 is added to the directory tree list, and the node level of the directory node corresponding to title T4 is increased by one level. Then, the node to be updated in the mapping relationship is matched to the directory node corresponding to title T1, and the directory node corresponding to title T1 is replaced with the directory node corresponding to title T4.
[0154] Understandable, Figure 8 The addition of target directory nodes to the directory tree list shown is only for illustrative purposes. The number and level of directory nodes can be varied according to actual circumstances, and this application does not limit them.
[0155] The following describes the principle of generating the target directory based on the directory tree list in the embodiments of this application, using different scenarios as examples.
[0156] In one possible implementation, when generating the target directory of the target webpage based on the directory tree list after adding the target directory node, the directory tree list after adding the target directory node can be parsed to obtain multiple directory node links, and the target directory of the target webpage can be generated based on the multiple directory node links.
[0157] Because target directory nodes have positional relationships when written to the directory tree list—for example, child nodes are added to the set of child directories of the parent node, or directory nodes at the same node level are added to the directory tree list in parallel—the directory tree list after adding target directory nodes can be parsed to obtain multiple directory node links. A directory node link can include multiple target directory nodes; that is, a target node link is a set of multiple target directory nodes. Furthermore, the target node level of the target directory nodes in the directory node link is generally decreasing sequentially. Also, different branch links may exist in the same directory node link; for example, a branch link will exist when a non-first target directory node has child directory nodes.
[0158] For example, refer to Figure 8 In the example shown, after completing the traversal, the resulting directory tree list contains two directory node links: one consisting of "Title T1, Title T2, Title T3" and the other consisting of "Title T4". Furthermore, if Title T2 has a subdirectory node Title T5 (not shown in the attached diagram), then "Title T1, Title T2, Title T3, Title T5" constitutes a directory node link, and "Title T2, Title T5" are branch links within this link.
[0159] In one possible implementation, the first directory node in any directory node link is the target directory node with the highest target node level in the directory node link. In this case, even if there is no first-level title in the target webpage, when generating the target directory of the target webpage based on multiple directory node links, the target directory node with the highest target node level in the directory node link can be automatically used as the first target directory node in the directory node link, thereby improving the reliability and compatibility of generating the target directory.
[0160] For example, refer to Figure 9 , Figure 9This is a schematic diagram of an optional structure of a target directory generated from a directory tree list according to an embodiment of this application. Taking a web page document as an example, the title list includes "1.1 Title T1", "1.1.1 Title T2", "1.2 Title T3", "1.1.2 Title T4", "1.1.2.1 Title T5", "1.1.2.2 Title T6", "1.1.2.2.1 Title T7", "1.1.2.2.2 Title T8", "1.1.2.3 Title T9", and "1.1.2.4 Title T10". Among them, "1.1 Title T1" and "1.2 Title T3" are second-level titles. Therefore, in generating... Figure 9 When generating the target directory on the left, “1.1 Heading T1” and “1.2 Heading T3” are used as the first directory node in their respective directory node links, that is, as the first-level directory nodes in their respective directory node links, which can improve the reliability and compatibility of generating the target directory.
[0161] In one possible implementation, when generating the target directory of the target webpage based on the directory tree list after adding the target directory node, the directory tree list after adding the target directory node can be parsed to obtain multiple directory node links. When the first directory node is the directory node with the lowest target node level in its directory node link, the second directory node is the first directory node in its directory node link, and the target node level of the first directory node is at least two levels higher than the target node level of the second directory node, the directory node link where the first directory node is located and the directory node link where the second directory node is located are merged to obtain a merged node link. The target directory of the target webpage is generated based on the merged node link and the remaining directory node links.
[0162] In this case, since the target node levels of the target directory nodes differ by at least two levels, the target node level of the first directory node is at least two levels higher than that of the second directory node. This indicates a gap between the first and second directory nodes. This situation may occur because the title levels in the title list of the target webpage are not continuous, or because one of the content titles in the title list has been deleted. In this case, merging the directory node links containing the first and second directory nodes into a merged node link; and generating the target directory of the target webpage based on the merged node link and the remaining directory node links, can effectively improve the accuracy and rationality of the target directory.
[0163] Specifically, the directory node link where the first directory node is located and the directory node link where the second directory node is located are merged into a merged node link. This can be achieved by taking the directory node in the directory node link corresponding to the second directory node as a child node of the first directory node.
[0164] In the underlying data processing, the write position of the directory node in the directory node link corresponding to the second directory node can be adjusted to write the directory node in the directory node link corresponding to the second directory node to the subdirectory set of the first directory node.
[0165] In one possible implementation, there may be multiple directory node links corresponding to the first directory node that meet the above conditions. In this case, when merging the directory node links corresponding to the first directory node and the second directory node, the directory node link corresponding to the second directory node may be merged into the directory node link of the first directory node that is adjacent to (closest to) the directory node link corresponding to the second directory node. Furthermore, the directory node link of the first directory node is located before the directory node link corresponding to the second directory node, so that the content of the directory node link corresponding to the first directory node is closer to that of the directory node link corresponding to the second directory node, thereby improving the accuracy and rationality of the target directory.
[0166] For example, refer to Figure 10 , Figure 10This is a schematic diagram illustrating another optional structure of the target directory generated from the directory tree list according to an embodiment of this application. Taking a web page document as an example, the title list includes "1. Heading T1", "1.1.1 Heading T2", "1.1.2 Heading T3", "1.1.3 Heading T4", "1.1.3.1 Heading T5", "1.1.3.2 Heading T6", "1.1.3.2.1 Heading T7", "1.1.3.2.2 Heading T8", "1.1.3.3 Heading T9", and "1.1.3.4 Heading T10". Here, "1.1. Heading T1" is a first-level heading, "1.1.1 Heading T2", "1.1.2 Heading T3", and "1.1.3 Heading T4" are third-level headings, "1.1.3.1 Heading T5", "1.1.3.2 Heading T6", "1.1.3.3 Heading T9", and "1.1.3.4 Heading T10" are fourth-level headings, and "1.1.3.2.1 Heading T7", "1.1.3.2.1 Heading T6", ... .2.2 Heading T8 is a fifth-level heading; 1. Heading T1 is a directory node link; 1.1.1 Heading T2 is a directory node link; 1.1.2 Heading T3 is a directory node link; 1.1.3 Heading T4; 1.1.3.1 Heading T5; 1.1.3.2 Heading T6; 1.1.3.2.1 Heading T7; 1.1.3.2.2 Heading T8; 1.1.3.3 Heading T9; 1. 1.3.4 Title T10 is a directory node link. In this case, the target node level of "1. Title T1" is two levels higher than the target node level of "1.1.1 Title T2", "1.1.2 Title T3" or "1.1.3 Title T4". Therefore, when generating the target directory, the directory node links corresponding to "1.1.1 Title T2", "1.1.2 Title T3" and "1.1.3 Title T4" will all be merged into the directory node link corresponding to "1. Title T1".
[0167] In one possible implementation, when merging the directory node links corresponding to the first directory node and the second directory node, the node name of the first directory node can be input into a pre-trained semantic recognition model to obtain a first semantic vector of the node name of the first directory node; the node name of the second directory node can be input into the pre-trained semantic recognition model to obtain a second semantic vector of the node name of the first directory node; the vector similarity between the first and second semantic vectors can be calculated, such as cosine similarity, Euclidean distance, etc. When the vector similarity is greater than or equal to a preset similarity threshold, the directory node links corresponding to the first and second directory nodes are then merged. Since vector similarity can characterize the semantic similarity between the node names of the first and second directory nodes, when the vector similarity is greater than or equal to the preset similarity threshold, it indicates that the semantic similarity between the node names of the first and second directory nodes is high. At this time, merging the directory node links corresponding to the first and second directory nodes can improve the rationality and reliability of the merging.
[0168] It is understood that the similarity threshold can be set according to the actual situation, such as 0.8, 0.9, etc., and this application embodiment does not limit it.
[0169] In training the semantic recognition model, multiple sample data pairs can be obtained. Each sample data pair consists of two sample titles, and each pair is labeled with a similarity tag. This similarity tag indicates whether the two sample titles in the sample data pair are semantically similar. During training, the sample data pairs are input into the semantic recognition model to obtain the sample semantic vectors of the two sample titles in each pair. By calculating the sample similarity between the two sample semantic vectors, the training loss value of the semantic recognition model is determined based on the sample similarity and the similarity tag. The parameters of the semantic recognition model are then adjusted based on the training loss value. By introducing these sample data pairs and training the semantic recognition model using sample similarity and similarity tags, the semantic recognition model can be better suited to the task of recognizing title semantic similarity, thus improving the training effect of the semantic recognition model.
[0170] In one possible implementation, when generating the target directory of the target webpage based on the directory tree list after adding the target directory node, the directory tree list after adding the target directory node can be parsed to obtain multiple directory node links. When a new directory node is inserted into a directory node link, and the new directory node has the same target node level as the first target directory node in the directory node link, the directory node link is split into two child node links based on the new directory node. The target directory of the target webpage is generated based on the child node links and the remaining directory node links.
[0171] When a new directory node is inserted into the directory node chain, meaning a new content title is inserted into the target webpage, a new title list can be generated, traversed, and each content title directory node recreated and added to the directory tree list. However, this method increases the time required to generate the target directory. In this case, the directory node chain can be split into two child node chains based on the new directory node. The target directory for the target webpage can be generated based on the child node chains and the remaining directory node chains. The first target directory node in the directory node chain and the new directory node serve as the first directory node for each of the two child node chains.
[0172] In the underlying data processing, after creating a new directory node, the write position of other directory nodes below the new directory node can be adjusted, and the other directory nodes below the new directory node can be written to the subdirectory set of the new directory node.
[0173] The above method does not require recreating each content title directory node and adding it to the directory tree list. It only requires a simple adjustment to the structure of the directory node link in the directory tree list. When a new directory node is inserted into the directory node link, the target directory can be updated quickly and accurately, improving processing efficiency.
[0174] For example, refer to Figure 11 , Figure 11 This is a schematic diagram illustrating another optional structure of the target directory generated from the directory tree list, provided in an embodiment of this application. Figure 10 In the example shown, if "2. Heading T11" is inserted between "1.1.3 Heading T4" and "1.1.3.1 Heading T5", where "2. Heading T11" is a first-level heading, the original directory node link will be split into child node links "1. Heading T1", "1.1.1 Heading T2", "1.1.2 Heading T3", "1.1.3 Heading T4", and child node links "2. Heading T11", "1.1.3.1 Heading T5", "1.1.3.2 Heading T6", "1.1.3.2.1 Heading T7", "1.1.3.2.2 Heading T8", "1.1.3.3 Heading T9", and "1.1.3.4 Heading T10", thereby generating a new target directory.
[0175] In one possible implementation, when a new directory node is inserted into the directory node link, in order to further improve the accuracy of the target directory, it can be identified whether the target directory node in the original directory node link includes a numeric prefix. For example, "1." in "1.Title T1" is a numeric prefix. Then, it can be identified whether the new directory node includes a numeric prefix. If the target directory node in the original directory node link includes a numeric prefix, but the new directory node does not, then a numeric prefix can be added to the new directory node based on the numeric prefix of the target directory node in the original directory node link. The numeric prefix added to the new directory node can be greater than the numeric prefix of the target directory node in the original directory node link, and the numeric prefix added to the new directory node is adjacent to the numeric prefix of the target directory node in the original directory node link.
[0176] For example, if the target directory node in the original directory node link has a numeric prefix of "1." and the new directory node is "Title T11", then the numeric prefix "2." can be added to the new directory node.
[0177] Additionally, if the newly added directory node includes a numeric prefix, the numeric prefixes of other directory nodes in the child node link corresponding to the newly added directory node can be adjusted based on the numeric prefix of the newly added directory node.
[0178] For example, if the newly added directory node is "2.TitleT11", then the numerical prefixes of the other directory nodes in the corresponding child node link, such as "1.1.3.1TitleT5", "1.1.3.2TitleT6", "1.1.3.2.1TitleT7", "1.1.3.2.2TitleT8", "1.1.3.3TitleT9", and "1.1.3.4TitleT10", can be adjusted accordingly to "2.1.3.1TitleT5", "2.1.3.2TitleT6", "2.1.3.2.1TitleT7", "2.1.3.2.2TitleT8", "2.1.3.3TitleT9", and "2.1.3.4TitleT10".
[0179] In one possible implementation, after generating the target directory, it can be displayed. When displaying the target directory, the display coordinate range of each target directory node is determined, and each target directory node is displayed according to its coordinate range. Simultaneously, within a directory node chain, the display coordinate range is adjusted based on the target node level of the target directory node. The horizontal coordinate range of the display coordinate range differs for target directory nodes at different levels; the lower the target node level, the greater the offset of the corresponding horizontal coordinate range. The horizontal coordinate range can be offset to the right, and the specific offset can be set according to actual conditions. By adjusting the display coordinate range based on the target node level, the display hierarchy of the target directory can be effectively improved.
[0180] For example, refer to Figure 9 In the example shown, the node level of the directory node "1.2 Heading T3" is higher than that of the directory node "1.1.2 Heading T4", and the node level of the directory node "1.1.2 Heading T4" is higher than that of the directory node "1.1.2.1 Heading T5". Therefore, the directory node "1.1.2 Heading T4" is offset to the right relative to the directory node "1.2 Heading T3", and the directory node "1.1.2.1 Heading T5" is offset to the right relative to the directory node "1.1.2 Heading T4".
[0181] In addition, when displaying the target directory, a collapse control can be further displayed. This control is used to collapse corresponding directory node links or branch links. Specifically, the collapse control can be associated with a corresponding directory node link or branch link. When a collapse command is received through the collapse control, the corresponding directory node link or branch link is collapsed, and the display coordinate range of other target directory nodes below the collapsed link is adjusted. By using the collapse control to collapse corresponding directory node links or branch links, the display flexibility of the target directory can be improved. Furthermore, when the number of directory nodes is large, the display space occupied by the target directory can be reduced, improving its readability.
[0182] For example, refer to Figure 10 , Figure 10 When the collapsible control 1001 of the directory node “1.1.3.2 Title T6” receives the collapsible instruction, it collapses the branch link where the directory node “1.1.3.2 Title T6” is located, that is, it hides and displays the subdirectory nodes “1.1.3.2.1 Title T7” and “1.1.3.2.2 Title T8” of the directory node “1.1.3.2 Title T6”.
[0183] The principle of the catalog generation method provided in this application embodiment is illustrated below with a practical example.
[0184] Referring to Table 1, which is an optional data structure for the title list provided in the embodiments of this application, the data type of the title list is an array; the data type of the title level of the content title is a number, with a value of 1 to 6; the data type of the title name of the content title is a string; other attributes of the content title will not be described in detail here.
[0185] Table 1
[0186] property type illustrate HeadList Array <headnode> < / headnode> Title list, array HeadNode.level number Heading level, number type, 1-6 HeadNode.label string Title name, string type Other attributes of HeadNode - -
[0187] Referring to Table 2, which is an optional data structure for the directory tree list provided in the embodiments of this application, the data type of the directory tree list is an array; the data type of the node level of the directory node is a numeric type, with a value of 1 to 6; the data type of the node name of the directory node is a string type; the first-level directory node does not have a parent node; the subdirectory set of the directory node includes all the child nodes under the directory node; other attributes of the directory node will not be described in detail here.
[0188] Table 2
[0189]
[0190] Based on the data structure shown in Tables 1 and 2, firstly, the webpage code text of the target webpage is obtained. Then, the content titles in the target webpage are identified based on the element tags in the webpage code text, resulting in first-level titles "1. Preface", "2. Abstract", "3. Chapter 1", and "4. Chapter 2", second-level titles "3.1 Section 1", "3.2 Section 2", and "4.1 Section 1", and third-level titles "3.1.1 First Paragraph", "3.1.2 Second Paragraph", and "4.1.1 First Paragraph". A title list is then constructed, where the content titles in the title list are arranged from top to bottom according to their position on the target webpage: "1. Preface", "2. Abstract", "3. Chapter 1", "3.1 Section 1", "3.1.1 First Paragraph", "3.1.2 Second Paragraph", "3.2 Section 2", "4. Chapter 2", "4.1 Section 1", and "4.1.1 First Paragraph".
[0191] Next, iterate through the list of titles to create mapping information and a list of directory trees;
[0192] First, iterate through the content title "1. Preface" and create the directory node corresponding to "1. Preface". Since this directory node is a first-level node, add the directory node corresponding to "1. Preface" to the end of the directory tree list and add the mapping relationship "2"-"1. Preface" to the mapping information.
[0193] Next, iterate through the content title "2. Summary", create the directory node corresponding to "2. Summary", and since this directory node is a first-level node, add the directory node corresponding to "2. Summary" to the end of the directory tree list, and update the mapping relationship "2"-"1. Preface" in the mapping information to "2"-"2. Summary";
[0194] Next, iterate through the content title "3. Chapter 1", create the directory node corresponding to "3. Chapter 1", and since this directory node is a first-level node, add the directory node corresponding to "3. Chapter 1" to the end of the directory tree list, and update the mapping relationship "2"-"3. Chapter 1" in the mapping information to "2"-"3. Chapter 1";
[0195] Next, iterate through the content title "3.1 Section 1" and create a directory node corresponding to "3.1 Section 1". Since this directory node is not a first-level node and does not have the same title level as the first content title in the title list, based on the node level (second level) of this directory node, match the parent node of this directory node from the mapping information as the directory node corresponding to "3. Chapter 1". Add the directory node corresponding to "3.1 Section 1" to the subdirectory list of the directory node corresponding to "3. Chapter 1" and add the mapping relationship "3" - "3.1 Section 1" to the mapping information.
[0196] Next, iterate through the content title "3.1.1 First Paragraph", create a directory node corresponding to "3.1.1 First Paragraph". Since this directory node is not a first-level node and does not have the same title level as the first target node, based on the node level (third level) of this directory node, match the parent node of this directory node from the mapping information as the directory node corresponding to "3.1 First Section". Add the directory node corresponding to "3.1.1 First Paragraph" to the subdirectory list of the directory node corresponding to "3.1 First Section", and add the mapping relationship "4" - "3.1.1 First Paragraph" to the mapping information.
[0197] Next, iterate through the content title "3.1.2 Second Paragraph" and create a directory node corresponding to "3.1.2 Second Paragraph". Since this directory node is not a first-level node and its title level is not the same as the first target node, based on the node level (third level) of this directory node, match the parent node of this directory node from the mapping information as the directory node corresponding to "3.1 First Section". Add the directory node corresponding to "3.1.2 Second Paragraph" to the subdirectory list of the directory node corresponding to "3.1 First Section" and update the mapping relationship "4"-"3.1.1 First Paragraph" in the mapping information to "4"-"3.1.2 Second Paragraph".
[0198] Next, iterate through the content title "3.2 Section 2", create the directory node corresponding to "3.2 Section 2". Since this directory node is not a first-level node and does not have the same title level as the first target node, based on the node level (second level) of this directory node, match the parent node of this directory node from the mapping information as the directory node corresponding to "3. Chapter 1", add the directory node corresponding to "3.2 Section 2" to the subdirectory list of the directory node corresponding to "3. Chapter 1", and update the mapping relationship "3"-"3.1 Section 1" in the mapping information to "3"-"3.2 Section 2";
[0199] Next, iterate through the content title "4. Chapter Two", create the directory node corresponding to "4. Chapter Two". Since this directory node is a first-level node, add the directory node corresponding to "4. Chapter Two" to the end of the directory tree list, and update the mapping relationship "2"-"3. Chapter One" in the mapping information to "2"-"4. Chapter Two";
[0200] Next, iterate through the content title "4.1 Section 1" and create a directory node corresponding to "4.1 Section 1". Since this directory node is not a first-level node and does not have the same title level as the first content title in the title list, based on the node level (second level) of this directory node, match the parent node of this directory node from the mapping information as the directory node corresponding to "4. Chapter 2". Add the directory node corresponding to "4.1 Section 1" to the subdirectory list of the directory node corresponding to "4. Chapter 2" and update the mapping relationship "3"-"3.2 Section 2" in the mapping information to "3"-"4.1 Section 1";
[0201] Next, iterate through the content title "4.1.1 First Paragraph", create the directory node corresponding to "4.1.1 First Paragraph". Since this directory node is not a first-level node and does not have the same title level as the first target node, based on the node level (third level) of this directory node, match the parent node of this directory node from the mapping information as the directory node corresponding to "4.1 First Section". Add the directory node corresponding to "4.1.1 First Paragraph" to the subdirectory list of the directory node corresponding to "4.1 First Section", and update the mapping relationship "4"-"3.1.2 Second Paragraph" in the mapping information to "4"-"4.1.1 First Paragraph";
[0202] At this point, all content titles in the title list have been traversed. The final directory tree list is then checked to ensure there are no gaps in the hierarchy. Then, refer to... Figure 12 , Figure 12 This is an alternative structural diagram of the target directory generated from the directory tree list provided in the embodiments of this application. The target directory of the target webpage can be generated from the final directory tree list after the aforementioned traversal is completed, and then displayed on the device after offset processing and association with collapse controls.
[0203] It is understood that although the steps in the above flowcharts are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated in this embodiment, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the above flowcharts may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages in other steps.
[0204] Reference Figure 13 , Figure 13 This is a schematic diagram of an optional structure of the catalog generation apparatus provided in an embodiment of this application. The catalog generation apparatus 1300 includes:
[0205] The node creation module 1301 is used to traverse multiple content titles of the target webpage in order from top to bottom, create target directory nodes based on the content titles in the current traversal round, and determine the target node level of the target directory nodes.
[0206] The mapping information acquisition module 1302 is used to acquire the target mapping information in the current traversal round, wherein the target mapping information is used to indicate the mapping relationship between the node level and the parent node;
[0207] Matching module 1303 is used to determine the target parent node corresponding to the target directory node from the target mapping information based on the target node level;
[0208] Add module 1304 to obtain the directory tree list of the target webpage, add the target directory node to the directory tree list according to the determination result of the target parent node, and update the target mapping information according to the target directory node;
[0209] The directory generation module 1305 is used to generate the target directory of the target webpage based on the directory tree list after adding the target directory node.
[0210] Furthermore, the aforementioned added module 1304 is specifically used for:
[0211] The writing position of the target directory node in the directory tree list is obtained based on the determination result of the target parent node;
[0212] Add the target directory node to the directory tree list based on the write location.
[0213] Furthermore, the aforementioned added module 1304 is specifically used for:
[0214] If the result of determining the target parent node is that the target directory node does not have a target parent node, the writing position of the target directory node in the directory tree list is determined to be the end position of the directory tree list;
[0215] Alternatively, if the target parent node is determined to be a target directory node, a set of subdirectories corresponding to the target parent node is created in the directory tree list, and the writing position of the target directory node in the directory tree list is determined as the end position in the subdirectory set.
[0216] Furthermore, the matching module 1303 mentioned above is specifically used for:
[0217] Match the target node level with the node level in the target mapping information, and determine the target parent node corresponding to the target directory node from the parent nodes in the target mapping information based on the matching result;
[0218] Alternatively, the target node level can be lowered and matched with the node level in the target mapping information. Based on the matching result, the target parent node corresponding to the target directory node can be determined from the parent nodes in the target mapping information.
[0219] Furthermore, the aforementioned added module 1304 is specifically used for:
[0220] The node to be replaced corresponding to the target directory node is determined from the target mapping information based on the target node level;
[0221] Update the node to be replaced to the target directory node to obtain the updated target mapping information.
[0222] Furthermore, the aforementioned added module 1304 is specifically used for:
[0223] Match the target node level with the node level in the target mapping information, and determine the node to be replaced corresponding to the target directory node from the parent node in the target mapping information based on the matching result;
[0224] Alternatively, the target node level can be elevated and matched with the node level in the target mapping information. Based on the matching result, the node to be replaced corresponding to the target directory node can be determined from the parent nodes in the target mapping information.
[0225] Furthermore, the aforementioned mapping information acquisition module 1302 is specifically used for:
[0226] If the target node level in the previous traversal round is level one, or if the target node level in the previous traversal round is the same as the target node level in the first traversal round, obtain the directory tree list and add the target directory node of the previous traversal round to the end of the directory tree list.
[0227] Add the mapping relationship between the target directory node and the target node level in the previous traversal round to the target mapping information in the previous traversal round to obtain the target mapping information in the current traversal round;
[0228] If the target node level in the current traversal round is level two or higher, or if the target node level in the current traversal round is different from the target node level in the first traversal round, obtain the target mapping information in the current traversal round.
[0229] Furthermore, the aforementioned node creation module 1301 is specifically used for:
[0230] Obtain the webpage code text of the target webpage;
[0231] Identify multiple content titles in the target webpage based on element tags in the webpage code text, and determine the title level of each content title based on the element tags;
[0232] Based on the top-to-bottom position order on the target webpage, a title list is constructed according to multiple content titles and their corresponding title levels, and the title list is traversed.
[0233] Furthermore, the aforementioned node creation module 1301 is specifically used for:
[0234] The node name is determined based on the content title in the current traversal round, and the node level is determined based on the title level of the content title in the current traversal round.
[0235] Create the target directory node based on the node name and node level.
[0236] Furthermore, the aforementioned directory generation module 1305 is specifically used for:
[0237] The directory tree list after adding the target directory node is parsed to obtain multiple directory node links. Each directory node link includes multiple target directory nodes, and the first directory node in any directory node link is the target directory node with the highest target node level in the directory node link.
[0238] The target directory for the target webpage is generated based on multiple directory node links.
[0239] Furthermore, the aforementioned directory generation module 1305 is specifically used for:
[0240] Parse the directory tree list after adding the target directory node to obtain multiple directory node links;
[0241] When the first directory node is the directory node with the lowest target node level in its directory node link, the second directory node is the first directory node in its directory node link, and the target node level of the first directory node is at least two levels higher than the target node level of the second directory node, the directory node link where the first directory node is located and the directory node link where the second directory node is located are merged to obtain a merged node link.
[0242] The target directory for the target webpage is generated based on the merged node links and the remaining directory node links.
[0243] Furthermore, the aforementioned directory generation module 1305 is specifically used for:
[0244] Parse the directory tree list after adding the target directory node to obtain multiple directory node links;
[0245] When a new directory node is inserted into a directory node link, and the new directory node has the same target node level as the first target directory node in the directory node link, the directory node link is split into two sub-node links based on the new directory node. The first target directory node in the directory node link and the new directory node are respectively used as the first directory node of the two sub-node links.
[0246] The target directory for the target webpage is generated based on the child node links and the remaining directory node links.
[0247] The aforementioned directory generation device 1300 and directory generation method are based on the same inventive concept. They traverse multiple content titles of a target webpage according to their top-to-bottom positions, create target directory nodes based on the content titles in the current traversal, determine the target node level of the target directory nodes, and then obtain the target mapping information in the current traversal. Since the target mapping information indicates the mapping relationship between node levels and parent nodes, the target parent node corresponding to the target directory node can be quickly determined from the target mapping information based on the target node level. This allows for the acquisition of the directory tree list of the target webpage, and the target directory nodes are added based on the determined target parent node. Adding the target directory node to the directory tree list generates the target directory for the target webpage. This process eliminates the need for string matching, effectively improving directory generation efficiency and reducing device memory usage. Furthermore, by traversing multiple content titles of the target webpage in top-down order, the target parent node of each content title's corresponding target directory node is determined. Simultaneously, in each traversal round, the target mapping information is updated based on the target directory node. This allows for rapid determination of the hierarchical relationship between different target directory nodes during target directory generation, improving the granularity of directory generation and maintaining its accuracy.
[0248] The electronic device provided in this application embodiment for executing the above-described catalog generation method can be a terminal, as shown below. Figure 14 , Figure 14 This is a partial structural block diagram of a terminal provided in an embodiment of this application. The terminal includes: a radio frequency (RF) circuit 1410, a memory 1420, an input unit 1430, a display unit 1440, a sensor 1450, an audio circuit 1460, a wireless fidelity (WiFi) module 1470, a processor 1480, and a power supply 1490, etc. Those skilled in the art will understand that... Figure 14 The terminal structure shown does not constitute a limitation on the terminal and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0249] The RF circuit 1410 can be used to receive and transmit signals during information transmission or calls. In particular, it receives downlink information from the base station and processes it with the processor 1480; in addition, it transmits uplink data to the base station.
[0250] The memory 1420 can be used to store software programs and modules. The processor 1480 executes various functional applications and data processing of the terminal by running the software programs and modules stored in the memory 1420.
[0251] The input unit 1430 can be used to receive input numeric or character information, and to generate key signal inputs related to the settings and function control of the terminal. Specifically, the input unit 1430 may include a touch panel 1431 and other input devices 1432.
[0252] Display unit 1440 can be used to display input or provided information, as well as various menus of the terminal. Display unit 1440 may include display panel 1441.
[0253] Audio circuitry 1460, speaker 1461, and microphone 1462 provide an audio interface.
[0254] In this embodiment, the processor 1480 included in the terminal can execute the directory generation method of the previous embodiment.
[0255] The electronic device provided in this application embodiment for executing the above-described directory generation method can also be a server, see reference. Figure 15 , Figure 15The diagram illustrates a partial structural block of a server provided in this application embodiment. The server 1500 can vary significantly due to different configurations or performance characteristics. It may include one or more Central Processing Units (CPUs) 1522 (e.g., one or more processors) and a memory 1532, and one or more storage media 1530 (e.g., one or more mass storage devices) for storing application programs 1542 or data 1544. The memory 1532 and storage media 1530 may be temporary or persistent storage. The program stored in the storage media 1530 may include one or more modules (not shown in the diagram), each module including a series of instruction operations on the server 1500. Furthermore, the CPU 1522 may be configured to communicate with the storage media 1530 and execute the series of instruction operations in the storage media 1530 on the server 1500.
[0256] Server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input / output interfaces 1558, and / or one or more operating systems 1541, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.
[0257] The processor in server 1500 can be used to execute directory generation methods.
[0258] This application also provides a computer-readable storage medium for storing program code, which is used to execute the directory generation methods of the foregoing embodiments.
[0259] This application also provides a computer program product, which includes a computer program stored in a computer-readable storage medium. A processor of a computer device reads the computer program from the computer-readable storage medium and executes the computer program, causing the computer device to perform the directory generation method described above.
[0260] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatuses.
[0261] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0262] It should be understood that in the description of the embodiments of this application, "multiple" means two or more, "greater than", "less than", "exceeding" etc. are understood to exclude the number itself, and "above", "below", "within" etc. are understood to include the number itself.
[0263] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0264] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0265] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0266] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0267] It should also be understood that the various implementation methods provided in this application can be combined arbitrarily to achieve different technical effects.
[0268] The above provides a detailed description of the preferred embodiments of this application. However, this application is not limited to the above-described embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of this application. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.
Claims
1. A catalog generation method characterized by comprising: include: Traverse multiple content titles of the target webpage in order from top to bottom, create target directory nodes based on the content titles in the current traversal round, and determine the target node level of the target directory nodes; Obtain the target mapping information in the current traversal round, wherein the target mapping information is used to indicate the mapping relationship between the node level and the parent node; The target parent node corresponding to the target directory node is determined from the target mapping information based on the target node level; Obtain the directory tree list of the target webpage, add the target directory node to the directory tree list according to the determination result of the target parent node, and update the target mapping information according to the target directory node; The target directory of the target webpage is generated based on the directory tree list after adding the target directory node; Determining the target parent node corresponding to the target directory node from the target mapping information based on the target node level includes: The target node level is matched with the node level in the target mapping information. Based on the matching result, the target parent node corresponding to the target directory node is determined from the parent nodes in the target mapping information. The minimum level of the node level in the target mapping information is level two. Alternatively, the target node level can be reduced and matched with the node level in the target mapping information. Based on the matching result, the target parent node corresponding to the target directory node can be determined from the parent nodes in the target mapping information. The minimum level of the node level in the target mapping information is level one.
2. The catalog generation method of claim 1, wherein, The step of adding the target directory node to the directory tree list based on the determination result of the target parent node includes: The write position of the target directory node in the directory tree list is obtained based on the determination result of the target parent node; The target directory node is added to the directory tree list according to the write location.
3. The catalog generation method of claim 2, wherein, The step of determining the write position of the target directory node in the directory tree list based on the determination result of the target parent node includes: When the result of determining the target parent node is that the target directory node does not have a target parent node, the writing position of the target directory node in the directory tree list is determined as the end position of the directory tree list; Alternatively, if the target parent node is determined to exist in the target directory node, a set of subdirectories corresponding to the target parent node is created in the directory tree list, and the writing position of the target directory node in the directory tree list is determined as the end position in the set of subdirectories.
4. The catalog generation method of claim 1, wherein, The step of updating the target mapping information based on the target directory node includes: The node to be replaced corresponding to the target directory node is determined from the target mapping information based on the target node level; The node to be replaced is updated to the target directory node to obtain the updated target mapping information.
5. The catalog generation method of claim 4, wherein, The step of determining the node to be replaced corresponding to the target directory node from the target mapping information based on the target node level includes: The target node level is matched with the node level in the target mapping information, and the node to be replaced corresponding to the target directory node is determined from the parent nodes in the target mapping information based on the matching result. Alternatively, the target node level can be raised and matched with the node level in the target mapping information. Based on the matching result, the node to be replaced corresponding to the target directory node can be determined from the parent nodes in the target mapping information.
6. The catalog generation method of claim 1, wherein, The step of obtaining the target mapping information in the current traversal round includes: If the target node level in the previous traversal round is level one, or the target node level in the previous traversal round is the same as the target node level in the first traversal round, obtain the directory tree list, add the target directory node from the previous traversal round to the end of the directory tree list, add the mapping relationship between the target directory node and the target node level from the previous traversal round to the target mapping information from the previous traversal round, and obtain the target mapping information in the current traversal round; If the target node level in the current traversal round is level two or above, or if the target node level in the current traversal round is different from the target node level in the first traversal round, obtain the target mapping information in the current traversal round.
7. The catalog generation method of claim 1, wherein, The step of traversing multiple content titles of the target webpage in top-to-bottom order includes: Obtain the webpage code text of the target webpage; Identify multiple content titles in the target webpage based on element tags in the webpage code text, and determine the title level of each content title based on the element tags; Based on the top-to-bottom position order in the target webpage, a title list is constructed according to multiple content titles and the title level corresponding to the content titles, and the title list is traversed.
8. The catalog generation method of claim 7, wherein, The step of creating a target directory node based on the content title in the current traversal round includes: The node name is determined based on the content title in the current traversal round, and the node level is determined based on the title level of the content title in the current traversal round. Create the target directory node based on the node name and the node level.
9. The catalog generation method according to any one of claims 1 to 8, characterized in that, The step of generating the target directory for the target webpage based on the directory tree list after adding the target directory node includes: The directory tree list after adding the target directory node is parsed to obtain multiple directory node links, wherein the directory node links include multiple target directory nodes, and the first directory node of any directory node link is the target directory node with the highest target node level in the directory node link; The target directory of the target webpage is generated based on the multiple directory node links.
10. The catalog generation method according to any one of claims 1 to 8, characterized in that, The step of generating the target directory for the target webpage based on the directory tree list after adding the target directory node includes: The directory tree list after adding the target directory node is parsed to obtain multiple directory node links; When the first directory node is the directory node with the lowest target node level in the directory node link it belongs to, the second directory node is the first directory node in the directory node link it belongs to, and the target node level of the first directory node is at least two levels higher than the target node level of the second directory node, the directory node link where the first directory node is located and the directory node link where the second directory node is located are merged to obtain a merged node link. The target directory of the target webpage is generated based on the merged node links and the remaining directory node links.
11. The catalog generation method according to any one of claims 1 to 8, characterized in that, The step of generating the target directory for the target webpage based on the directory tree list after adding the target directory node includes: The directory tree list after adding the target directory node is parsed to obtain multiple directory node links; When a new directory node is inserted into the directory node link, and the new directory node has the same target node level as the first target directory node in the directory node link, the directory node link is split into two sub-node links based on the new directory node. The first target directory node in the directory node link and the new directory node are respectively the first directory node of the two sub-node links. The target directory of the target webpage is generated based on the child node links and the remaining directory node links.
12. A catalog generation device, characterized in that, include: The node creation module is used to traverse multiple content titles of the target webpage in order from top to bottom, create target directory nodes based on the content titles in the current traversal round, and determine the target node level of the target directory nodes. The mapping information acquisition module is used to acquire the target mapping information in the current traversal round, wherein the target mapping information is used to indicate the mapping relationship between the node level and the parent node; The matching module is used to determine the target parent node corresponding to the target directory node from the target mapping information based on the target node level; An add module is used to obtain a directory tree list of the target webpage, add the target directory node to the directory tree list according to the determination result of the target parent node, and update the target mapping information according to the target directory node; The directory generation module is used to generate the target directory of the target webpage based on the directory tree list after adding the target directory node; Determining the target parent node corresponding to the target directory node from the target mapping information based on the target node level includes: The target node level is matched with the node level in the target mapping information. Based on the matching result, the target parent node corresponding to the target directory node is determined from the parent nodes in the target mapping information. The minimum level of the node in the target mapping information is level two. Alternatively, the target node level can be lowered and matched with the node level in the target mapping information. Based on the matching result, the target parent node corresponding to the target directory node can be determined from the parent nodes in the target mapping information. The minimum level of the node in the target mapping information is level one.
13. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that, When the processor executes the computer program, it implements the directory generation method according to any one of claims 1 to 11.
14. A computer readable storage medium, the storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by a processor, it implements the directory generation method according to any one of claims 1 to 11.
15. A computer program product comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the directory generation method according to any one of claims 1 to 11.
Citation Information
Patent Citations
Document directory generation method and device, electronic equipment and medium
CN114118070A