Method, device, equipment and storage medium for parsing file
By reading and parsing large XML files line by line, the problem of high memory consumption and low parsing efficiency in existing technologies is solved, and efficient file parsing and database insertion are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2023-10-16
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies consume a large amount of memory and have low parsing efficiency when parsing large XML files, making them unsuitable for the constantly updated large XML files in network service subscriptions.
The XML file is read line by line using a character stream reading function. The tag value of the current line of data is determined, and the data is parsed and stored in the database according to the tag value. This line-by-line reading method reduces memory usage.
It effectively reduces memory usage, improves parsing efficiency, solves the problem of slow parsing and database insertion of large XML files, and breaks through file size limitations.
Smart Images

Figure CN117435771B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of file parsing technology, and in particular to a method, apparatus, device, and storage medium for parsing files. Background Technology
[0002] In the subscription of computing network services, customers often subscribe to bundled services. These services contain a large number of product attributes and dimensional attributes, which makes the number of nodes and message levels in the message much larger than in the past, and thus generates a large number of large XML (eXtensible Markup Language) files.
[0003] Currently, most XML file parsing is tied to business logic, meaning it parses files according to pre-defined business specifications. This makes it difficult to apply universally to parsing the large, potentially constantly updated XML files involved in network service subscriptions. These large XML files are generated by customer subscription packages and are constantly expanding and updating. These customer subscription packages contain a considerable number of product attributes and dimensional attributes, resulting in a significantly larger message volume and message hierarchy compared to previous messages.
[0004] The existing file parsing method uses a general parsing method to parse the file, reading all the content of all nodes of the file at once and parsing it. This method consumes a lot of memory space for storage and parsing, and has low parsing efficiency. Summary of the Invention
[0005] The main objective of this invention is to provide a method, apparatus, and storage medium for parsing files, aiming to solve the technical problem of large memory space consumption in existing file parsing techniques.
[0006] To achieve the above objectives, the present invention provides a file parsing method, the method comprising the following steps:
[0007] The current file is read line by line using a character stream reading function, and the current line of data in the current file is read into memory.
[0008] Determine the current label value of the current row of data;
[0009] The current row of data is parsed and stored in the database based on the current tag value.
[0010] Optionally, parsing and storing the current row data into the database based on the current tag value includes:
[0011] When the current label value is within the hierarchical reference set, query the label level of the current row of data;
[0012] The data in the current row is parsed and stored in the database based on the label level.
[0013] Optionally, parsing and storing the current row data into the database according to the tag level includes:
[0014] When the tag level is a level 1 tag, the current row data is parsed to determine whether the title of the read current row data is the configured comment title;
[0015] When the title read is the configured comment title, obtain the field value of the current row of data;
[0016] The title and the field value are respectively placed into the field name list and field value list in the database instance;
[0017] Back up the first-level field values in the field values to the first-level title backup set, and record the parsed results to the result set parameters of each layer of the current title wrapper.
[0018] Optionally, parsing and storing the current row data into the database according to the tag level includes:
[0019] When the tag level is not a first-level tag, the current tag value is compared with the top element of the inbound order stack;
[0020] When the current tag value is inconsistent with the top element of the stack, determine the title type corresponding to the current tag value;
[0021] When the title type is a peer title that reappears in parallel, determine the number of times the title of the current row of data is entered into the database;
[0022] When the number of entries exceeds a preset number, the title is added to the hierarchical result set;
[0023] When the number of data entries equals the preset number, the first title that is not included in the hierarchical result set is obtained, and the title and the first title are added to the hierarchical result set to complete the data entry of the current row.
[0024] Optionally, after determining the title type corresponding to the current tag value when the current tag value is inconsistent with the top element of the stack, the method further includes:
[0025] When the title type is "Update Parent Title", retrieve the parent data of the "Update Parent Title" tag;
[0026] The parent data is placed into the result set parameters of each level of the current title wrapper.
[0027] Optionally, when the tag level is not a level 1 tag, after comparing the current tag value with the top element of the inbound order stack, the method further includes:
[0028] When the current tag value is the same as the top element of the stack, obtain the current top element of the stack;
[0029] Pop the current top element from the stack and assign the popped element to the current top element.
[0030] Check if the inbound order stack is empty;
[0031] When the inbound order stack is empty, the current tag value is inbound.
[0032] When the stack is not empty, update the popped top element to the current top element, and return to the step of popping the current top element and assigning the popped element to the current top element.
[0033] Optionally, parsing and storing the current row data into the database based on the current tag value includes:
[0034] When the current tag value is not within the hierarchical comparison set, detect whether there are parallel titles in the current row of data;
[0035] Retrieve the field values of the current row of data;
[0036] When there are no parallel titles, the title and the field value are respectively placed into the field name list and field value list in the data entry instance to complete the data entry of the current row.
[0037] Optionally, after obtaining the field values of the current row of data, the method further includes:
[0038] When there are multiple fields in the title list, the fields in the field name list are deduplicated and duplicate fields are deleted.
[0039] The field value and the title are combined into a single-level result set.
[0040] Optionally, before reading the current line of the current file into memory using the character stream reading function, the method further includes:
[0041] Read each task in the file import task table to obtain the directory task status;
[0042] The tasks in the directory that are in a pending state will be designated as pending tasks.
[0043] Read the files that need to be parsed in the task to be processed according to the file directory field, encapsulate the read files, and obtain a memory array of the task information to be parsed and put into the database.
[0044] The memory array of the file insertion task information to be parsed and inserted into the database is used as the current file.
[0045] Furthermore, to achieve the above objectives, the present invention also proposes a file parsing apparatus, the file parsing apparatus comprising:
[0046] The reading module is used to read the current file line by line using a character stream reading function, and to read the current line data of the current file into memory;
[0047] The determination module is used to determine the current label value of the current row of data;
[0048] The parsing module is used to parse and store the current row data into the database based on the current tag value.
[0049] Furthermore, to achieve the above objectives, the present invention also proposes a file parsing device, the file parsing device comprising: a memory, a processor, and a file parsing program stored in the memory and executable on the processor, the file parsing program being configured to implement the steps of the file parsing method as described above.
[0050] In addition, to achieve the above objectives, the present invention also proposes a storage medium storing a file parsing program, which, when executed by a processor, implements the steps of the file parsing method described above.
[0051] This invention reads the current file line by line using a character stream reading function, reading the current line data of the current file into memory; determines the current tag value of the current line data; and parses and stores the current line data into a database based on the current tag value. By reading each line of file data line by line, parsing and storing the data into the database simultaneously, the space occupied is greatly reduced. Attached Figure Description
[0052] Figure 1 This is a schematic diagram of the structure of the file parsing device in the hardware operating environment involved in the embodiments of the present invention;
[0053] Figure 2 This is a flowchart illustrating the first embodiment of the file parsing method of the present invention;
[0054] Figure 3 This is a flowchart illustrating the second embodiment of the file parsing method of the present invention;
[0055] Figure 4 This is a flowchart illustrating the third embodiment of the file parsing method of the present invention;
[0056] Figure 5 This is a flowchart illustrating the fourth embodiment of the file parsing method of the present invention;
[0057] Figure 6 This is a schematic diagram of the overall process of an embodiment of the file parsing method of the present invention;
[0058] Figure 7 This is a structural block diagram of the first embodiment of the file parsing device of the present invention.
[0059] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0060] It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the invention.
[0061] Reference Figure 1 , Figure 1 This is a schematic diagram of the parsing file device structure of the hardware operating environment involved in the embodiments of the present invention.
[0062] like Figure 1 As shown, the file parsing device may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to enable communication between these components. The user interface 1003 may include a display screen or an input unit such as a keyboard; optionally, the user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface or a wireless interface (such as a Wireless-Fidelity (Wi-Fi) interface). The memory 1005 may be high-speed random access memory (RAM) or stable non-volatile memory (NVM), such as a disk drive. The memory 1005 may also optionally be a storage device independent of the aforementioned processor 1001.
[0063] Those skilled in the art will understand that Figure 1 The structure shown does not constitute a limitation on the file parsing device and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0064] like Figure 1As shown, the memory 1005, which serves as a storage medium, may include an operating system, a network communication module, a user interface module, and a file parsing program.
[0065] exist Figure 1 In the file parsing device shown, the network interface 1004 is mainly used for data communication with the network server; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in the file parsing device of the present invention can be set in the file parsing device, and the file parsing device calls the file parsing program stored in the memory 1005 through the processor 1001 and executes the file parsing method provided in the embodiment of the present invention.
[0066] In the subscription of computing network services, customers often subscribe to bundled services. These services contain a large number of product attributes and dimensional attributes, which makes the number of nodes and message levels in the message much larger than in the past, and thus generates a large number of large XML files.
[0067] For example, Table 1 below shows the title and name fields involved in a large XML file:
[0068] Table 1
[0069] XML_SIGN TITLE_LEVEL TITLE_NAME EBOSS_BILL_BBOSS_M_DETAIL 1 ProvCode EBOSS_BILL_BBOSS_M_DETAIL 1 PayTag EBOSS_BILL_BBOSS_M_DETAIL 2 ECinfo EBOSS_BILL_BBOSS_M_DETAIL 3 ProductInfo EBOSS_BILL_BBOSS_M_DETAIL 4 SubProductInfo EBOSS_BILL_BBOSS_M_DETAIL 5 FeeInfo EBOSS_BILL_BBOSS_M_DETAIL 6 Province2CloudSettleinfo
[0070] For Table 1 above, the explanation using blood relations is as follows:
[0071] 1 ProvCode, 1 PayTag, 2 ECinfo, 3 ProductInfo, 4 SubProductInfo, 5FeeInfo, 6 Province2CloudSettleinfo.
[0072] In this hierarchy, the first-level `ProvCode` and `PayTag` are the progenitors, like siblings. `ECinfo` is the second-generation progenitor, which can have multiple siblings (i.e., other second-level tags). For example, multiple tags named `PayTagECinfo` are considered parallel. There are also second-level tags with other names, which are not considered parallel. It's understandable that the descendants of a second-generation progenitor can also have siblings. For example, the third-generation `ProductInfo` can have multiple siblings, and so on. For instance, the `SubProductInfo` node, as the fourth generation, can also have multiple siblings appearing side-by-side. In a large XML file, this means that after `SubProductInfo` appears once, many other `SubProductInfo` nodes appear. Other nodes, such as `FeeInfo` and `Province2CloudSettleinfo`, can also appear side-by-side. This continues until the final generation, such as the sixth generation `Province2CloudSettleinfo`, which can also have many siblings. In other words, in network service subscriptions, for any large XML file within that service scenario, any parent tag can appear multiple times, and each sibling tag can continuously derive child tags. This is a typical application scenario for large XML files in network service subscriptions, and this continuous appearance of sibling tags and the continuous deriving of child tags can expand almost infinitely. That is to say, this invention addresses the technical problem of how to completely parse such potentially ever-expanding large XML files in network service subscriptions, caused by the increasing file data volume due to the continuous appearance of sibling tags in parent tags and the increasing file data volume due to the continuous deriving of new child tags from child tags. In existing technologies, most XML file parsing is tied to business operations. That is, it parses files according to pre-defined business specifications, which cannot be well and universally applied to the parsing of the large XML files that may be constantly updated in the context of network service subscriptions. Such large XML files are precisely the special type of XML files that this invention attempts to fully parse. These large XML files are generated by customer subscription bundled services and are constantly expanding and updating. These customer subscription bundled services contain a considerable number of product attributes and dimensional attributes, making the number of nodes and message levels in the message significantly larger than in previous messages.
[0073] Therefore, in the prior art, if we want to fully parse the large XML file involved in this invention, we can only consider a few general XML file parsing methods, including: DOM, SAX, StAX, and using databases to store and parse XML data.
[0074] Therefore, this invention aims to propose a method for parsing large files. It reads potentially constantly updated large XML files involved in network service subscriptions using a file character stream, parsing and storing the data simultaneously while reclaiming memory space. This avoids the problem of rapidly increasing memory consumption due to the continuous appearance of parallel sibling tags within the XML file, or the continuous generation of new child tags from child tags. Furthermore, this invention aims to complete the parsing and storage of such large XML files using only the most common hosts, as its core concept means that the technical solution avoids space accumulation. Thus, this invention not only solves the problem of fast parsing of large XML files but also addresses the issues of slow storage speed and high memory consumption, overcoming the size limitations of large XML files.
[0075] This invention provides a method for parsing files, referring to... Figure 2 , Figure 2 This is a flowchart illustrating the first embodiment of the file parsing method of the present invention.
[0076] In this embodiment, the file parsing method includes the following steps:
[0077] Step S10: Read the current file line by line using the character stream reading function, and read the current line data of the current file into memory.
[0078] It should be noted that the execution subject in this embodiment can be a file parsing device, such as a host, or other devices that can perform the same or similar functions. This embodiment does not limit this; this embodiment uses a file parsing device as an example for explanation.
[0079] In practice, the current file is an XML file, but it can also be a file of other formats. This embodiment uses an XML file as an example for explanation.
[0080] It should be noted that the character stream reading function can be the readLine function. The readLine function starts reading the current file, reading line by line in a loop. During the reading process, the file is parsed and the data is inserted into the database. The first-level heading entry flag of each line of data in the current file is set to the initial value, that is, the first-level heading entry flag titleFirstFlag is set to false. false is the default initial value.
[0081] The `readLine` function reads data line by line from the current file, loading the current line into memory. After reading the current line, it parses and stores the data in the database. Then, it reads the next line, parses and stores the data in the database, and so on, until all lines of the current file have been parsed and stored. By reading and parsing the current line data simultaneously, memory usage is reduced.
[0082] Optionally, before reading a file, the file entry directory task table can be read first to determine the directory task status, thereby parsing the file to be parsed and saving it in the source directory as preparation work before reading input. In this case, before step S10, the method further includes: reading each task in the file entry task table to obtain the directory task status; taking the tasks with the directory task status of pending processing as pending tasks; reading the files to be parsed in the pending tasks according to the file directory field, encapsulating the read files to obtain a memory array of file entry task information to be parsed and entered into the database; and taking the memory array of file entry task information to be parsed and entered into the database as the current file.
[0083] It should be noted that the status of a directory task can be obtained by reading all tasks in the file import task table. The status of a directory task includes pending status, processing status, successful processing status, and failed processing status. Pending status can be represented by 0, processing status by 1, successful processing status by 2, and failed processing status by -1.
[0084] For example, a process named `taskDistribution` reads tasks with a directory task status of 0 from the file import directory task table (BM_WAREHOUSE_DIRECTORY_TASK). It then reads all XML files that need parsing in that directory according to the file directory field, saves the parsed files in the original directory without deleting them, and performs preparatory work before reading input. The task dispatching process `taskDistribution` reads the file import directory task table (BM_WAREHOUSE_DIRECTORY_TASK), as shown in Table 2, and finds tasks with a directory task status (dir_status) of 0. There are four directory task statuses: 0 - pending, 1 - processing, 2 - successful processing, and -1 - failed processing. The task dispatching process reads all XML files that need parsing in that directory according to the file directory field (file_directory). At this time, the recycling flag (bak_flag) is 0 by default, meaning no recycling is performed because the parsed files are still saved in the original directory and are not deleted.
[0085] Table 2
[0086]
[0087] The read files are encapsulated to generate file entry task instances (the file_status in the instance is initialized to 0 by default for the XML file parsing process to read), and then entered into the file entry task table (BM_WAREHOUSE_FILE_TASK). This decouples task issuance and task execution, allowing for asynchronous work. The directory where the XML files are placed can be flexibly configured in the file entry directory task table, and multiple unrelated file entry directory tasks can be configured simultaneously, distinguished only by the directory task ID (dir_task_id).
[0088] The operation flow of the XML file parsing process fileReverseWarehouse is as follows: The XML file parsing process fileReverseWarehouse first queries the file entry task table for tasks with a file task status (file_status has 4 types of values: 0 - pending, 1 - processing, 2 - processing completed, -1 - processing failed). Tasks with a value of 0 are encapsulated into an in-memory array of XML file entry task information to be parsed and entered into the database. The in-memory array of the file entry task information to be parsed and entered into the database is used as the current file for parsing.
[0089] The memory array containing the file entry task information to be parsed and entered into the database includes the name of each file (file_name), the file task status (file_status), the parent directory task ID (dir_task_id, which is the same as the dir_task_id in the file entry directory task table), the directory where the file is located (file_directory), the table name to which the file will be entered into the database after parsing (XML_table_name), and the instance ID of the XML file title level related configuration table for this type (XML_config_title). This ID is the same as the (BM_TITLE_COLLECTION) XML configuration primary key field value (XML_sign) of the title level related configuration table.
[0090] As shown in Table 3 below, Table 3 contains the corresponding simple XML file templates.
[0091] Table 3
[0092]
[0093] The `title_level` value (used to describe the heading level of a document or content) has two types: 1 and other numbers greater than 1. 1 represents the value corresponding to that node, and all parsed instances will carry a similar value. <provcode> 000< / provcode>This value of 000 is carried by all instances. Other values greater than 1 serve to allow the program to predict the file layout, using `title_level` to understand the nesting hierarchy of each node. The `title_name` field also needs to be configured in two ways: when `title_level` is 1, the corresponding title value needs to be configured directly, not the parent title. When `title_level` is greater than 1, the parent title with child values is configured, such as `FeeList` and `FeeInfo`. If `FeeList` has no value, but `FeeInfo` has values, then `FeeInfo` is configured; otherwise, no configuration is needed.
[0094] Step S20: Determine the current label value of the current row of data.
[0095] In practice, the label value of the currently scanned line data in the current file can be obtained, which facilitates subsequent file parsing.
[0096] Step S30: Parse and store the current row data into the database based on the current tag value.
[0097] In practice, the data of the current row can be parsed and stored in the database based on the current tag value. For example, it can be determined whether the current tag value is within the hierarchical reference set, and different parsing and storage processes can be performed accordingly.
[0098] This embodiment reads the current file line by line using a character stream reading function, reading the current line data into memory; determining the current tag value of the current line data; and parsing and storing the current line data into the database based on the current tag value. By reading each line of file data line by line, parsing and storing the data into the database simultaneously, the space usage is greatly reduced.
[0099] refer to Figure 3 , Figure 3 This is a flowchart illustrating the second embodiment of the file parsing method of the present invention.
[0100] Based on the first embodiment described above, step S30 of the file parsing method in this embodiment includes:
[0101] Step S301: When the current label value is within the hierarchical comparison set, query the label level of the current row data.
[0102] In practice, it can be determined whether the current tag value is within the hierarchy reference set, which is titleLevelMap. The system iteratively checks whether the current tag value Field is within titleLevelMap. This can be done by parsing the title value read from each line; the title format is... <title>、< / title> , <title> value< / title>There are three types. The title is obtained by retrieving the value after the "<" or " / " symbol and before the ">" symbol. The parsing code below assigns the value to the Field, which is then compared with the value in titleLevelMap. The data structure of titleLevelMap is a Map. <XML_sign,Map<Field,title_level> Therefore, we use the XML_sign value retrieved above to obtain the tag level map data for this type of file, obtain the key set of the map data, and compare it to check whether the titleLevelMap contains the current tag value.
[0103] For example, a Field is represented as:
[0104]
[0105] In practice, if the current label value is within the hierarchical reference set, the label level of the current row of data can be viewed. For example, one can query whether the first-level label update titleFirstFlag of the current row of data is true, i.e., whether the label level is first level; or query whether the second-level and lower label updates titleFirstFlag of the current row of data are true, i.e., whether the label level is second level and lower.
[0106] Step S302: Parse and store the current row data into the database according to the label level.
[0107] It should be understood that different tag levels can be used to parse the current line of data differently, thereby achieving file parsing and database entry. For the large XML files operated on in this invention, the large file size is caused by the continuous appearance of parallel sibling tags in the parent tag and the continuous derivation of new child tags from child tags. This invention parses all the bloodline information, from the last generation of relatives to their first-degree ancestors, into one block. For example:
[0108] For example, Province2CloudSettleinfo in Table 1 is the last node. As a node, it has a lot of its own information. Parsing it into the final instance to be stored in the database requires parsing all the information from its generation upwards, all the way to its first-level ancestor. Moreover, in this parsing process, to save time and space, this invention cleverly uses line-by-line file reading, reading and parsing line by line from the beginning of the file. Through file layout awareness and layered encapsulation, it achieves the "ancestor recognition" of each "last generation." This enables the simultaneous reading, parsing, and storage of potentially constantly updated large XML files involved in network service subscriptions, regardless of whether the parent tag expands or the child tag expands, while simultaneously reclaiming memory space. This solves the problem of significantly increased time and space complexity caused by using general-purpose XML parsing techniques in existing technologies. Therefore, this invention not only solves the problem of fast parsing of large XML files but also solves the problems of slow storage speed and high memory consumption of large files, and breaks through the size limit of large XML files.
[0109] In this embodiment, when the current tag value is within the hierarchical comparison set, the tag level of the current row of data is queried; the current row of data is parsed and stored in the database according to the tag level. By parsing and storing the current row of data in the database simultaneously using the file's tag level, the efficiency of file parsing is improved, and the problem of data volume mismatch is solved.
[0110] refer to Figure 4 , Figure 4 This is a flowchart illustrating the third embodiment of the file parsing method of the present invention.
[0111] Based on the first and second embodiments described above, step S302 of the file parsing method in this embodiment includes:
[0112] Step S3021: When the tag level is a level 1 tag, parse the current row data to determine whether the title of the read current row data is the configured comment title.
[0113] Understandably, if the tag level is a level 1 tag, the current row of data can be parsed to determine whether the title being read is the configured comment title, i.e., the remark title.
[0114] If the tag level is neither a first-level tag nor a tag below the second level, then determine whether the title of the current row of data is the configured comment title. If it is not the configured remark title, then exit the current loop and directly enter the next loop to read the next row of the XML file.
[0115] Step S3022: When the title read is the configured comment title, obtain the field value of the current row data.
[0116] If the title read is the configured comment title, then the field value of the current row of data is obtained, i.e., the value.
[0117] Step S3023: Place the title and the field value into the field name list and field value list in the database instance, respectively.
[0118] In practice, the title of the row data being read can be placed into the field name list fieldList in the inbound instance (fieldList, valueList), and the field value can be placed into the field value list valueList in the inbound instance.
[0119] Step S3024: Back up the first-level field values in the field values to the first-level title backup set, and record the parsed results to the result set parameters of each layer of the current title wrapper.
[0120] In practice, the first-level field values are backed up to the first-level header backup set, i.e., fieldBakMap, and the parsing results are recorded in the fieldRecordMap parameter of each result set wrapped by the current header. The first-level header is not pushed onto the stack. The program assumes that each field of the first-level header is a field value that the database instance needs to carry. The fieldBakMap set is used to directly assign the values to the current result set to achieve the purpose of carrying all the values.
[0121] The simplified XML file already has many titles. A similar XML file, tens or even hundreds of MB in size, will have even more title values, but the configuration still only requires a simple seven values. Furthermore, XML files of the same layout type only need to be configured once. There's no need to configure the mapping between the input fields and the various title fields; the program will automatically generate the required input statements for each title value based on the BM_TITLE_COLLECTION configuration data for this type of XML file, and input them into the corresponding input instance table. Before the file task loop begins, the data in the BM_TITLE_COLLECTION table is stored using a map data structure, with the XML file category (XML_sign) as the key and title_level as the value. The program iterates through the list of XML file input task information to be parsed and input, performing the following parsing and input operation for each file input task until the list is completely looped through.
[0122] All variables used in the initialization algorithm, except for `titleLevelMap`, are initialized to null, false, or 0. Some of these variables are listed below:
[0123] A single result record List instance:
[0124] fieldList: Stores data table fields;
[0125] valueList: Stores the values in the XML file corresponding to the fields in the data table;
[0126] Temporary data table field references stored variables:
[0127] fieldListTemp: Prevents two Lists from having the same reference in the result set;
[0128] Single-level result set reference:
[0129] resultFieldList: A collection of fields for a single-level result set;
[0130] resultValueList: A single-level result set of numerical values;
[0131] Hierarchical result set collection:
[0132] resultFieldMap: Data structure Map <Integer,List <list>>;
[0133] resultValueMap: Data structure Map <Integer,List <list>>;
[0134] Records the deepest level in the current result set: presentLevel;
[0135] The entry identifier under the second-level heading is titleSecondFlag;
[0136] First-level heading entry identifier: titleFirstFlag;
[0137] Variable refresh flag: isFlush;
[0138] IsPop is an indicator of whether something has been popped from the stack.
[0139] Record the current parent element's title that can be added to the database: keyTitle;
[0140] The popped element: lastPeek;
[0141] Current top element of the stack: secondTitle;
[0142] The data entry task corresponds to the pre-configured title levels: titleLevelMap;
[0143] The current storage contains the values of fieldlist, so that duplicate attribute names can be appended with a count suffix: fieldCurrentRepeatMap;
[0144] TitlePushTimeMap records the number of times the title is pushed onto the stack.
[0145] Level 1 heading backup map: fieldBakMap;
[0146] The configured parent title reading order record stack: titleStack;
[0147] Record the result set of each level of the current title, with the key being the parent title level and the value being the field value and data value of the current level depth: fieldRecordMap, valueRecordMap;
[0148] Current tag value: Field;
[0149] The values between two parent tags: betweenFieldList, betweenValueList;
[0150] This embodiment parses the current row data when the tag level is a level 1 tag to determine whether the title of the read current row data is a configured comment title; when the read title is a configured comment title, it obtains the field values of the current row data; it puts the title and the field values into the field name list and field value list in the database instance, respectively; it backs up the level 1 field values in the field values to the level 1 title backup set, and records the parsing results into the result set parameters of each layer wrapped by the current title, thereby enabling fast parsing and database entry of the current row data and improving parsing efficiency.
[0151] refer to Figure 5 , Figure 5 This is a flowchart illustrating the fourth embodiment of the file parsing method of the present invention.
[0152] Based on the first embodiment described above, step S302 of the file parsing method in this embodiment includes:
[0153] Step S3021': When the tag level is not a first-level tag, compare the current tag value with the top element of the inbound order stack.
[0154] It should be noted that if the tag level is not a level 1 tag, the current tag value is compared with the top element of the inbound order stack titleStack to determine whether the current tag value is consistent with the top element of the inbound order stack titleStack.
[0155] Step S3022': When the current tag value is inconsistent with the top element of the stack, determine the title type corresponding to the current tag value.
[0156] It should be noted that if the current tag value is different from the top element of the titleStack (the stack representing the tag's insertion order), a push operation is required. For example, if the tag value is read... <productinfo>After pushing ProductInfo onto the stack, it was read again. <subproductinfo>Another situation is reading <subproductinfo> Afterwards, I read< / subproductinfo> Pop SubProductInfo from titleStack and read it again. <subproductinfo>If a new parent heading needs to be parsed, or if a parallel sibling heading reappears after being parsed and popped from the stack, then the heading type corresponding to the current tag value can be determined. There are two possibilities: either a parallel sibling heading reappears, or a new parent heading appears. Therefore, the specific heading type can be determined.
[0157] Step S3023': When the title type is a peer title reappearing, determine the number of times the title of the current row of data is entered into the database.
[0158] It should be understood that the condition for parallel reappearance of titles of the same level type can be determined using the condition `isPop && Field.equals(lastPeek)`. `isPop` represents the boolean value of the popped field (true if popped, false if not), and `lastPeek` is the value of the last popped field. If the current field value and `lastPeek` value are equal, it indicates that the previously added tag is the same as the current tag, and the parallel case branch is entered. When titles of the same level type reappear, the number of times the title of the current row of data has been added to the stack can be determined. The `titlePushTimeMap` data structure records the addition of each field, which can be directly retrieved to determine the addition count and perform different operations based on the addition count.
[0159] Step S3024': When the number of entries exceeds the preset number, add the title to the hierarchical result set.
[0160] It should be noted that the preset number of entries can be set to once. If the number of entries is more than once, then only the single result set of this time, i.e. the title, needs to be added to the hierarchical result set, the result set hierarchy level presentLevel is updated, the fieldCurrentRepeatMap parameter is cleared, the current fieldList value is reassigned to fieldCurrentRepeatMap to ensure that the entries under the same title are consistent, and finally the stack operation is performed.
[0161] Step S3025': When the number of data entries equals the preset number of entries, obtain the first title that has not been included in the hierarchical result set, and add the title and the first title to the hierarchical result set to complete the data entry of the current row.
[0162] If the number of entries is once, which equals the preset number, then the list of individual result sets that were not included in the previous result set needs to be added to the result set. Simultaneously, the current individual result set should also be added to the hierarchical result set. Therefore, the first title not included in the hierarchical result set can be obtained. The first title is the individual result set that was not included in the previous result set. Both the title and the first title are then added to the hierarchical result set. Because this solution parses the file line by line, parent titles may appear consecutively, but the previous individual result set may still be at the next higher level. For example... <feeinfo>and <province2cloudsettleinfo>If these events occur consecutively, there will be a delay in synchronizing to deeper result sets, resulting in missing data from those deeper levels. Therefore, the header and the first header can be added to the hierarchical result set to complete the import of the current row of data. Then, the data can be read line by line until the file is completely read, thus completing the import of the current file.
[0163] Optionally, after obtaining the title type, after step S3022', the method further includes: when the title type is to update the parent title, obtaining the parent data of the updated parent label; and putting the parent data into the result set parameters of each layer of the current title wrapper.
[0164] If the tag type is a new parent title, then the parent data of the parent tag is obtained and put into the fieldRecordMap parameter of each level of the result set wrapped by the current title. For example, if the level of the new parent tag is 4, then it is put into the array with key 4 of the fieldRecordMap data structure.
[0165] In this embodiment, when the tag level is not a first-level tag, the current tag value is compared with the top element of the input order stack; when the current tag value is inconsistent with the top element, the title type corresponding to the current tag value is determined; when the title type is a parallel reappearance of a title of the same level, the number of times the title of the current row of data is input is determined; when the number of inputs is greater than a preset number, the title is added to the hierarchical result set; when the number of inputs is equal to the preset number, the first title that is not included in the hierarchical result set is obtained, and the title and the first title are added to the hierarchical result set to complete the input of the current row of data. Thus, corresponding parsing and input can be performed according to different title types, improving the parsing effect.
[0166] As an example, if the current label value is inconsistent with the top element of the stack, then after step S3022', the following steps are also included:
[0167] When the current tag value is the same as the top element of the stack, obtain the current top element of the stack;
[0168] Pop the current top element from the stack and assign the popped element to the current top element.
[0169] Check if the inbound order stack is empty;
[0170] When the inbound order stack is empty, the current tag value is inbound.
[0171] When the stack is not empty, update the popped top element to the current top element, and return to the step of popping the current top element and assigning the popped element to the current top element.
[0172] It should be noted that if the current tag value is the same as the top element of the stack, then the fields wrapped by the current tag value need to be closed to perform stack pop operations. Therefore, the current top element can be obtained and popped from the stack, for example, when previously read... <subproductinfo> The node pushes SubProductInfo into titleStack, and after some reading and parsing, it reads...< / subproductinfo> In this scenario, the current top element of the stack is popped and assigned to `lastPeek`, where `lastPeek` represents the value popped from the stack last time. The `isPop` pop flag is set to `true`.
[0173] In practice, since the current level may decrease by one after popping from the stack, it is necessary to merge the result set of the current level into the result set of the previous level and remove the current result set from the current level.
[0174] If the closed title value is a second-level heading, set titleSecondFlag to false, clear the fieldCurrentRepeatMap collection, and then assign the first-level backup title to fieldCurrentRepeatMap. This indicates that the second-level heading has ended and the next second-level heading has begun. The first-level heading is a field value that must be carried by each instance by default.
[0175] The system can check if the input order stack (for second-level and lower headings) is empty. If the stack is empty, it means that the text within the second-level heading has been fully parsed and can be input into the database. In this case, the current tag value is input, and the variables need to be reset. The reset code is shown below to prepare for further parsing of subsequent second-level headings. Because the parsing results are pushed upwards level by level, the final input result will either be in the second level of the hierarchical result set or in a single result set.
[0176]
[0177]
[0178] It should be noted that if the push order stack (for second-level and lower headings) is not empty, it means that the end position of the second-level heading has not been parsed yet. In this case, simply update the top element of the popped stack to the secondTitle, that is, update the popped top element to the current top element, and repeat the steps of popping the current top element and assigning the popped element to the current top element until all top elements of the stack have been compared.
[0179] By reading the file line by line in a loop according to the above steps until the file is read, unlike the traditional method of reading the entire file into memory, this method uses memory objects to store the hierarchical nested structure of the XML file. The storage hierarchy can exceed 7 levels because it cannot distinguish which levels are needed and which are not, resulting in invalid storage. Then, the memory object is parsed and stored layer by layer. Taking 7 levels as an example, this requires at least 7 loops, increasing memory usage. If this invention takes 50 seconds to read and parse a 1GB XML file into memory, the existing technology requires first reading the XML file and encapsulating it into a memory object, a process that takes at least 50 seconds. Reading a 1GB XML file into memory requires at least 4GB of memory space. After reading, it is necessary to traverse 7 levels of loops, and to ensure no information is missed, some variable values for the current level need to be stored at each level for use in deeper encapsulation. The time complexity becomes the seventh power of a single loop, and the space complexity increases with each level.
[0180] This embodiment achieves fast data entry by: obtaining the current top element of the stack when the current tag value matches the top element of the stack; popping the current top element from the stack and assigning it the popped element; checking if the data entry order stack is empty; entering the current tag value into the database when the data entry order stack is empty; updating the popped top element to the current top element when the data entry order stack is not empty, and returning to the steps of popping the current top element and assigning it the popped element.
[0181] Optionally, the current label value may or may not be within the hierarchical reference set, therefore step S30 includes:
[0182] Step S301': When the current label value is not within the hierarchical comparison set, detect whether there is a parallel situation in the title of the current row data.
[0183] Step S302': Obtain the field values of the current row of data.
[0184] It should be noted that if titleLevelMap does not contain the value corresponding to Field, and titleSecondFlag is true, the program considers Field to be a subtitle of a second-level or lower-level heading, and needs to perform an input operation to put it into the input list. At this time, there are two cases: whether the headings of the current row of data have parallel situations or not. Therefore, it can detect whether the headings of the current row of data have parallel situations and obtain the field value of the current row of data.
[0185] When there are multiple fields in the title list, the fields in the field name list are deduplicated and duplicate fields are deleted.
[0186] The field value and the title are combined into a single-level result set.
[0187] It should be understood that detecting parallel titles can be done if: a title has been popped from the stack, and the current result set's level is greater than the current parent title's level, or the current result set is not empty. All these conditions indicate parallel titles at the same level. The values need to be placed into a single-level result set collection (resultFieldList, resultValueList). The number of individual result set lists in the single-level result set collection should be updated accordingly.
[0188] When placing the field list (fieldList), it is necessary to deduplicate the fields to avoid exceptions such as duplicate or missing fields being entered into the database. Therefore, deduplication can be performed on the fields in the field name list to remove duplicate fields, and then the field values and titles can be placed into a single-level result set. Setting titleSecondFlag to false indicates that this field is the title and does not need to be entered into the database; the current loop should be skipped, and a write loop should be performed to read the next line of data from the XML file.
[0189] When there are no parallel titles, the title and the field value are respectively placed into the field name list and field value list in the database instance.
[0190] It should be noted that the absence of parallel titles means that the title has not been popped from the stack, or the result set hierarchy is less than or equal to the current parent tag hierarchy value, or the result set is empty. In all these cases, it is considered that there are no parallel titles at the same level, and the title and value of the row entered into the database are directly put into a single result record List instance (fieldList, valueList).
[0191] The time complexity of this solution increases linearly, meaning it is positively linearly related to the file size. As mentioned earlier, this invention reads the file line by line from beginning to end, without encapsulating nested hierarchical structures or looping through hierarchical structures internally. It relies on the node insertion order stack.
[0192] The current level is determined by the top element `titleStack` and the level mapping set `titleLevelMap`. Since there is no looping search, the level increases linearly, which is why it is highly efficient and fast. A brief comparison test between this invention and existing technologies is as follows: Taking a 1GB XML file with two levels of nesting as an example, it was ultimately confirmed that even in extreme cases where only 128MB of memory is available, this invention can divide the 1GB into eight 128MB blocks. The test times were 48.8s, 49.6s, 49.6s, 50.4s, and 49.6s, with an average time of 6.2 * 8 = 49.6s, which can be approximated as 50s. In contrast, conventional XML parsing techniques in the prior art take 2500s (50 squared) for a two-level nested XML file.
[0193] The following are detailed comparative test examples between the present invention and the prior art:
[0194] The test platform consisted of an Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz, 4GB of RAM, with 2 processes and 40 threads. The test file was a 1GB, 2-level nested XML file. Furthermore, this invention was compared with conventional XML parsing methods. Conventional methods require two nested for loops to parse the test file, while this invention uses a line-by-line reading approach and stores the XML data using a specialized data structure. Compared to conventional methods, this invention saves a significant amount of memory.
[0195] Table 4 below compares the test results of four conventional XML parsing methods in the prior art with those of the present invention:
[0196] Table 4
[0197]
[0198]
[0199] like Figure 6 As shown, Figure 6 This is a schematic diagram of the overall parsing process in this embodiment. The `readLine` function reads the input XML file data and assigns it to a `Field`. It iteratively checks if the `Field` is contained in `titleLevelMap`. If the `Field` is contained and is a first-level tag, the title of the currently read line is added to `fieldList`, and the value is added to `valueList`. Then, these are directly added to the database instance (`fieldList`, `valueList`), and the first-level field values are backed up to `fieldBakMap`. The parsing results are recorded in the `fieldRecordMap` parameter of each level of the result set wrapped by the current title. If the `Field` is contained but is not a first-level tag, it checks if the current tag value matches the top element of the stack. If they match, it checks if the parent tag sequence stack is empty. If it is empty, the database entry operation is performed. If it is not empty, it means that the end position of the second-level title has not yet been parsed, so only the top element of the popped stack is updated to `secondTitle`, and the pop operation is performed. If the current tag value does not match the top element of the stack, a database entry operation is required, and the elements are merged and added to the database. If the Field is not included, check if titleSecondFlag is true. If it is true, determine if there are any parallel content within the tag. If there are parallel content, put the value into a single-level result set. If there are no parallel content, directly put the title and value of the row into a single result record List instance. If titleSecondFlag is not true, the content of the row does not need to be entered into the database, and the current loop is terminated.
[0200] This embodiment detects whether there are parallel titles in the current row of data when the current tag value is not in the hierarchy comparison set; obtains the field value of the current row of data; and when there are no parallel titles, puts the title and the field value into the field name list and field value list in the database instance, respectively, thereby quickly processing the title and field value and improving parsing efficiency.
[0201] Reference Figure 7 , Figure 7 This is a structural block diagram of the first embodiment of the file parsing device of the present invention.
[0202] like Figure 7 As shown, the file parsing device proposed in this embodiment of the invention includes:
[0203] The reading module 10 is used to read the current file line by line using a character stream reading function, and to read the current line data of the current file into memory.
[0204] The determination module 20 is used to determine the current label value of the current row data.
[0205] The parsing module 30 is used to parse and store the current row data into the database based on the current tag value.
[0206] This embodiment reads the current file line by line using a character stream reading function, reading the current line data into memory; determining the current tag value of the current line data; and parsing and storing the current line data into the database based on the current tag value. By reading each line of file data line by line, parsing and storing the data into the database simultaneously, the space usage is greatly reduced.
[0207] In one embodiment, the parsing module 30 is further configured to query the tag level of the current row data when the current tag value is within the hierarchical comparison set; and parse and store the current row data into the database according to the tag level.
[0208] In one embodiment, the parsing module 30 is further configured to parse the current row data when the tag level is a first-level tag, determine whether the title of the read current row data is a configured comment title; when the read title is a configured comment title, obtain the field value of the current row data; put the title and the field value into the field name list and field value list in the database instance respectively; back up the first-level field value in the field value to the first-level title backup set, and record the parsing result in each layer of result set parameters wrapped by the current title.
[0209] In one embodiment, the parsing module 30 is further configured to: compare the current tag value with the top element of the inbound order stack when the tag level is not a first-level tag; determine the title type corresponding to the current tag value when the current tag value is inconsistent with the top element of the stack; determine the number of times the title of the current row data is inbound when the title type is a parallel reappearance of the same-level title; add the title to the hierarchical result set when the number of inbounds is greater than a preset number; and obtain the first title that is not placed in the hierarchical result set when the number of inbounds is equal to the preset number, and add the title and the first title to the hierarchical result set to complete the inbound of the current row data.
[0210] In one embodiment, the parsing module 30 is further configured to, when the title type is to update the parent title, obtain the parent data of the updated parent title tag; and put the parent data into the result set parameters of each layer wrapped by the current title.
[0211] In one embodiment, the parsing module 30 is further configured to: obtain the current top element of the stack when the current tag value is consistent with the top element of the stack; pop the current top element from the stack and assign the popped element to the current top element; detect whether the input order stack is empty; when the input order stack is empty, input the current tag value into the stack; when the input order stack is not empty, update the popped top element to the current top element and return to the step of popping the current top element from the stack and assigning the popped element to the current top element.
[0212] In one embodiment, the determining module 20 is further configured to: detect whether there is a parallel situation in the title of the current row data when the current tag value is not located in the hierarchical comparison set; obtain the field value of the current row data; and, when there is no parallel situation in the title, put the title and the field value into the field name list and field value list in the database instance, respectively.
[0213] In one embodiment, the determining module 20 is further configured to, when there are parallel cases in the title, deduplicate the fields in the field name list, delete the duplicate fields, and combine the field values and the title into a single-level result set.
[0214] In one embodiment, the reading module 10 is further configured to read each task in the file entry task table to obtain the directory task status; take the tasks with the directory task status as pending as pending tasks; read the files to be parsed in the pending tasks according to the file directory field; encapsulate the read files to obtain a memory array of file entry task information to be parsed and entered into the database; and take the memory array of file entry task information to be parsed and entered into the database as the current file.
[0215] Furthermore, embodiments of the present invention also propose a storage medium storing a file parsing program, which, when executed by a processor, implements the steps of the file parsing method described above.
[0216] Since this storage medium adopts all the technical solutions of all the above embodiments, it has at least all the beneficial effects brought about by the technical solutions of the above embodiments, which will not be repeated here.
[0217] It should be understood that the above are merely illustrative examples and do not constitute any limitation on the technical solutions of the present invention. In specific applications, those skilled in the art can make settings as needed, and the present invention does not impose any restrictions on this.
[0218] It should be noted that the workflow described above is merely illustrative and does not limit the scope of protection of this invention. In practical applications, those skilled in the art can select some or all of the workflow to achieve the purpose of this embodiment according to actual needs, and no restrictions are imposed here.
[0219] In addition, for technical details not described in detail in this embodiment, please refer to the file parsing method provided in any embodiment of the present invention, which will not be repeated here.
[0220] Furthermore, it should be noted that in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one…" does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.
[0221] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0222] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as read-only memory (ROM) / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.
[0223] The above are merely preferred embodiments of the present invention and do not limit the scope of the patent. Any equivalent structural or procedural transformations made based on the description and drawings of the present invention, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention. < / feeinfo> < / subproductinfo> < / subproductinfo> < / productinfo> < / list> < / list>
Claims
1. A method of parsing a file, characterized by, The method for parsing the file includes: The current file is read line by line using a character stream reading function, and the current line of data in the current file is read into memory. Determine the current tag value of the current line of data in the current file; The data of the current row is parsed and stored in the database based on the current tag value; The step of parsing and storing the current row data into the database based on the current tag value also includes: When the current tag value is not within the hierarchical comparison set, detect whether there are parallel titles in the current row of data; Retrieve the field values of the current row of data; When there are no parallel titles, the title and the field value are respectively placed into the field name list and field value list in the data entry instance to complete the data entry of the current row.
2. The method for parsing files as described in claim 1, characterized in that, The step of parsing and storing the current row data into the database based on the current tag value includes: When the current label value is within the hierarchical reference set, query the label level of the current row of data; The data in the current row is parsed and stored in the database based on the label level.
3. The method for parsing files as described in claim 2, characterized in that, The step of parsing and storing the current row of data into the database according to the label level includes: When the tag level is a level 1 tag, the current row data is parsed to determine whether the title of the read current row data is the configured comment title; When the title read is the configured comment title, obtain the field value of the current row of data; The title and the field value are respectively placed into the field name list and field value list in the database instance; Back up the first-level field values in the field values to the first-level title backup set, and record the parsed results to the result set parameters of each layer of the current title wrapper.
4. The method of parsing a file of claim 2, wherein, The step of parsing and storing the current row of data into the database according to the label level includes: When the tag level is not a first-level tag, the current tag value is compared with the top element of the inbound order stack; When the current tag value is inconsistent with the top element of the stack, determine the title type corresponding to the current tag value; When the title type is a peer title that reappears in parallel, determine the number of times the title of the current row of data is entered into the database; When the number of entries exceeds a preset number, the title is added to the hierarchical result set; When the number of data entries equals the preset number, the first title that is not included in the hierarchical result set is obtained, and the title and the first title are added to the hierarchical result set to complete the data entry of the current row.
5. The method of parsing a file of claim 4, wherein, After determining the title type corresponding to the current tag value when the current tag value is inconsistent with the top element of the stack, the method further includes: When the title type is "Update Parent Title", retrieve the parent data of the "Update Parent Title" tag; The parent data is placed into the result set parameters of each level of the current title wrapper.
6. The method for parsing files as described in claim 4, characterized in that, When the tag level is not a level 1 tag, after comparing the current tag value with the top element of the inbound order stack, the method further includes: When the current tag value is the same as the top element of the stack, obtain the current top element of the stack; Pop the current top element from the stack and assign the popped element to the current top element. Check if the inbound order stack is empty; When the inbound order stack is empty, the current tag value is inbound. When the stack is not empty, update the popped top element to the current top element, and return to the step of popping the current top element and assigning the popped element to the current top element.
7. The method of parsing a file of claim 1, wherein, After obtaining the field values of the current row of data, the process further includes: When there are multiple fields in the title list, the fields in the field name list are deduplicated and duplicate fields are deleted. The field value and the title are combined into a single-level result set.
8. The method of parsing a file according to any one of claims 1 to 7, wherein, Before reading the current line of the current file into memory using the character stream reading function, the process further includes: Read each task in the file import task table to obtain the directory task status; The tasks in the directory that are in a pending state will be designated as pending tasks. Read the files that need to be parsed in the task to be processed according to the file directory field, encapsulate the read files, and obtain a memory array of the task information to be parsed and put into the database. The memory array of the file insertion task information to be parsed and inserted into the database is used as the current file.
9. An apparatus for parsing a file, the apparatus comprising: The method for parsing a file as described in any one of claims 1 to 8 is performed on the file parsing apparatus, the apparatus comprising: The reading module is used to read the current file line by line using a character stream reading function, and to read the current line data of the current file into memory; The determination module is used to determine the current label value of the current row of data; The parsing module is used to parse the current row data and store it in the database based on the current tag value; The parsing module is further configured to detect whether there are parallel titles in the current row of data when the current tag value is not in the hierarchy comparison set; obtain the field values of the current row of data; and, when there are no parallel titles, put the title and the field values into the field name list and field value list in the database instance, respectively, to complete the database entry of the current row of data.
10. An apparatus for parsing a file, the apparatus comprising: The device for parsing a file includes: a memory, a processor, and a program for parsing a file stored in the memory and executable on the processor, the program for parsing a file being configured to implement the method for parsing a file as described in any one of claims 1 to 8.
11. A storage medium, characterized in that, The storage medium stores a program for parsing files, which, when executed by a processor, implements the method for parsing files as described in any one of claims 1 to 8.