Information processing systems, information processing methods, and programs
An information processing system automates the classification and storage of administrative documents by estimating procedure categories, addressing inefficiencies in manual handling and reducing errors, thus enhancing organizational efficiency.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- SOVA CO LTD
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-26
AI Technical Summary
Existing systems for organizing documents related to administrative procedures, such as taxation, social insurance, and registration, require manual classification and distribution, leading to decreased work efficiency and increased labor due to the need for knowledge and manual handling.
An information processing system utilizing one or more processors to acquire data files, estimate the administrative procedure category based on text or image data, and automatically store the files in appropriate folders using rule-based and AI-driven classification.
The system significantly reduces the effort required for manual file organization, minimizes storage errors, and enhances efficiency by automating the classification and storage process, thereby reducing search time and improving the overall organization of documents.
Smart Images

Figure 0007880669000001_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to an information processing system, an information processing method, and a program.
Background Art
[0002] Conventionally, in accounting firms and the like, there has been a practice of digitizing voucher documents and application forms received from customers and storing them in appropriate folders. For example, Patent Document 1 describes a technique for associating vouchers either simultaneously with or after the receipt of journal entries by preparing in advance an attachment information file for batch attachment and all attachment voucher files within one folder separately from the journal entry file.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] However, in the above - mentioned technology, when it is necessary to organize files in a specific folder in advance, or when there is a need to check the content of the files and manually distribute them, it may lead to a decrease in work efficiency. In particular, accurately classifying and storing in hierarchical folders for each administrative procedure category covering a wide range such as taxation, social insurance, and registration requires knowledge and labor.
[0005] In view of the above - mentioned problems, an object of this disclosure is to provide an information processing system, an information processing method, and a program that can improve the efficiency of document organization related to administrative procedures.
Means for Solving the Problems
[0006] An information processing system according to one aspect of this disclosure comprises one or more processors and memory. The one or more processors are capable of performing the following steps: in the acquisition step, they acquire a data file from a client terminal operated by the client; in the estimation step, they estimate the administrative procedure category corresponding to the acquired data file based on the characteristics of the text data or image data contained in the acquired data file; and in the sorting step, they store the data file in a storage folder determined according to the administrative procedure category.
[0007] An information processing method according to one aspect of this disclosure includes each of the above steps performed by an information processing system.
[0008] A program according to one aspect of this disclosure is a program that causes a computer having one or more processors and memory to perform each of the steps described above. [Effects of the Invention]
[0009] This disclosure provides an information processing system, information processing method, and program that can improve the efficiency of organizing documents related to administrative procedures. [Brief explanation of the drawing]
[0010] [Figure 1] This figure shows an example of the overall configuration of an information processing system. [Figure 2] This figure shows an example of the hardware configuration of an information processing device. [Figure 3] This figure shows an example of the hardware configuration of the client's terminal and the staff member's terminal. [Figure 4] This is a block diagram showing the functional configuration of the control unit of an information processing device. [Figure 5] This flowchart shows an example of the pre-registration process. [Figure 6] This figure shows an example of the data structure of the client master. [Figure 7]This figure shows an example of the data structure of a distribution rule table. [Figure 8] This is a flowchart showing an example of the document sorting process. [Figure 9] This figure shows an example of a document upload screen. [Figure 10] This is a flowchart illustrating an example of the process for updating client information. [Figure 11] This figure shows an example of the administration screen and notification display. [Modes for carrying out the invention]
[0011] Embodiments of the present disclosure will be described in detail below with reference to the drawings. In each drawing, the same or corresponding elements are denoted by the same reference numerals, and redundant explanations will be omitted where necessary for clarity.
[0012] 1. Hardware Configuration This section describes the hardware configuration of the information processing system 1 according to this embodiment. Figure 1 is a diagram showing an example of the overall configuration of the information processing system. In the following description, "processor" is synonymous with "circuitry" composed of a central processing unit, etc.
[0013] 1.1 Information Processing System 1 Information Processing System 1 is a computer system used in accounting firms and similar businesses to automatically classify and store documents submitted by clients (clients) according to the procedural category. Information Processing System 1 comprises an information processing device 2, a client terminal 3, and a staff terminal 4. The information processing device 2, the client terminal 3, and the staff terminal 4 are connected to each other via a network NW such as the Internet or a LAN (Local Area Network).
[0014] [Procedure Category] The procedure classification is specifically an administrative procedure classification. The administrative procedure classification is classification information indicating the type of administrative procedure related to the document file, and is the classification information used to determine the storage destination folder in which the said document file should be stored. The administrative procedure classification includes at least at least any one of taxation, social insurance, and registration. That is, the administrative procedure classification includes at least any one of tax procedures, social insurance procedures, and registration procedures.
[0015] [Administrative Procedure] (Tax Procedure) Tax procedures are procedures to the tax office, tax bureau, or National Tax Agency. Examples of tax procedures include declarations, payments, refund claims, filings, applications, or submission of attached documents accompanying these, corrections, withdrawals, and inquiries regarding corporate tax, income tax, consumption tax, withholding tax, inheritance tax, gift tax, and other national taxes. Specifically, tax procedures include, for example, submission of articles of incorporation, application for opening a payroll office, application for approval of blue form tax returns, submission of selection notice for consumption tax taxable business, application for approval of special cases for withholding tax due dates, submission of official documents, submission of year-end adjustment related documents, and submission of various declaration forms (final tax return forms, amended tax return forms, claims for correction, etc.).
[0016] (Social Insurance Procedure) Social insurance procedures are procedures to the pension office (National Pension Service), National Health Insurance Association, health insurance union, or Labor Standards Inspection Office, public employment security office, or other administrative agencies. Here, examples of social insurance procedures include procedures for obtaining / losing qualifications, calculation basis filings, monthly amount change filings, dependent change filings, or issuance of separation notices related to health insurance, employee pension insurance, employment insurance, and workers' accident compensation insurance.
[0017] (Registration Procedure) Registration procedures are procedures performed at the Legal Affairs Bureau. Registration procedures include at least one of real estate registration procedures and commercial / corporate registration procedures. Real estate registration procedures include, for example, registration of transfer of ownership, registration of establishment of mortgage, and registration of cancellation of mortgage. Commercial / corporate registration procedures include, for example, registration of establishment, registration of change of officers, registration of relocation of head office, registration of change of trade name, or registration of capital increase. In this specification, "procedure" includes applications, notifications, reports, declarations, submission of supporting documents, corrections, withdrawals, and responses to inquiries related thereto. Furthermore, administrative procedures may take any form: electronic application, paper submission, in-person submission, or submission by mail.
[0018] 1.2 Information Processing Device 2 Information Processing Device 2 is a computer such as a server that receives data, analyzes its contents, and sorts it into folders. Information Processing Device 2 receives data uploaded from the client terminal 3, analyzes its contents using an artificial intelligence model (hereinafter referred to as AI model 6), and stores it in the appropriate folder. Information Processing Device 2 may consist of a single server device, or it may consist of multiple physical servers or a group of virtual servers on the cloud.
[0019] Figure 2 shows an example of the hardware configuration of an information processing device. The information processing device 2 comprises a control unit 21, a storage unit 22, and a communication unit 23, and these components are electrically connected within the information processing device 2 via a communication bus 26.
[0020] The control unit 21 performs processing and control of the overall operation related to the information processing device 2. The control unit 21 is a processor, such as a central processing unit (CPU). The control unit 21 realizes various functions related to the information processing device 2 by reading predetermined programs stored in the memory unit 22. That is, information processing instructions stored in the memory unit 22 can be executed as various functional units included in the control unit 21 by being specifically realized by the control unit 21, which is an example of hardware. The functional configuration will be described in more detail in the next section. Note that the control unit 21 is not limited to being a single unit, and may be implemented with multiple control units 21 for each function, or a combination thereof.
[0021] The storage unit 22 is a storage device that includes a non-temporary computer-readable medium or a physical storage medium, and stores various types of information as defined above. The storage unit 22 can be implemented, for example, as a storage device such as a solid-state drive (SSD) that stores various programs related to the information processing device 2 executed by the control unit 21, or as a memory such as random access memory (RAM) that stores temporarily necessary information (arguments, arrays, etc.) related to program calculations. The storage unit 22 stores various programs related to the information processing device 2 executed by the control unit 21, various data such as the client master T1 and distribution rule table T2 described later, and variables, etc.
[0022] The communication unit 23 is a communication interface for connecting to the network NW.
[0023] 1.3 Client Terminal 3 Client Terminal 3 is a computer terminal operated by a client (customer) who outsources business to an accounting firm or similar organization. Client Terminal 3 can be a smartphone, tablet, personal computer, or other communication terminal capable of input / output. Client Terminal 3 is used to transmit document files such as receipts, invoices, or administrative procedure documents held by the client to Information Processing Device 2. Document files can be text files, PDF files, or image data.
[0024] Figure 3(A) shows an example of the hardware configuration of the client terminal 3. The client terminal 3 comprises a control unit 31, a storage unit 32, a communication unit 33, an input unit 34, and a display unit 35, and these components are connected via a communication bus 36. The control unit 31 is a processor such as a CPU. The hardware configuration of the control unit 31, storage unit 32, and communication unit 33 is substantially the same as that of the control unit 21, storage unit 22, and communication unit 23 in the information processing device 2 described above, so a detailed explanation is omitted.
[0025] The input unit 34 may be included in the casing of the client terminal 3, or it may be an external component. For example, the input unit 34 may be integrated with the display unit 35 to function as a touch panel. Of course, a switch button, mouse, QWERTY keyboard, etc., may be used instead of a touch panel. In other words, the input unit 34 receives operation input from the user. This input is transmitted as a command signal to the control unit 31 via the communication bus 36, and the control unit 31 can perform predetermined controls and calculations as needed.
[0026] The display unit 35 may be, for example, included in the casing of the client terminal 3, or it may be an external component. The display unit 35 displays a graphical user interface (GUI) screen that can be operated by the user. Preferably, the display unit 35 uses display devices such as a CRT display, liquid crystal display, organic EL display, and plasma display, depending on the type of client terminal 3.
[0027] 1.4 Staff Terminal 4 Terminal 4 is a computer terminal operated by a staff member of an accounting firm or similar organization (for example, an accountant, tax accountant, or office worker). Figure 3(B) shows an example of the hardware configuration of the staff terminal 4. The staff terminal 4 comprises a control unit 41, a storage unit 42, a communication unit 43, an input unit 44, and a display unit 45, and these components are connected via a communication bus 46. The hardware configuration of the control unit 41, storage unit 42, communication unit 43, input unit 44, and display unit 45 is substantially the same as that of the control unit 31, storage unit 32, communication unit 33, input unit 34, and display unit 35 of the client terminal 3 described above, so a detailed explanation is omitted. The staff terminal 4 is used for checking documents sorted by the information processing device 2, setting sorting rules, and managing client information, etc.
[0028] 2. Functional Configuration This section describes the functional configuration of Embodiment 1, particularly the functional configuration of the information processing device 2. The information processing method described later is executed by the information processing device 2.
[0029] Figure 4 is a block diagram showing the functional configuration of the control unit 21 of the information processing device 2. The control unit 21 implements the functions of the registration unit 210, acquisition unit 211, estimation unit 212, distribution unit 213, duplicate determination unit 214, log generation unit 215, and output control unit 216 (display control unit 217, data output control unit 218) by executing a program.
[0030] As a registration step, the registration unit 210 registers the allocation rules obtained from the employee terminal 4 operated by the accounting staff member into the allocation rule table T2, and the basic information about the requester (requester information) into the requester master T1. The registration unit 210 also has a function to update the requester information in the requester master T1 in response to the accounting staff member's approval operation for the update proposal in the requester information update step described later.
[0031] The acquisition unit 211, as an acquisition step, acquires document files related to the procedure from the client terminal 3 or the staff member terminal 4. These document files may be, for example, scanned PDF data, image data taken with a smartphone, or text data. The acquisition unit 211 may also acquire files attached to emails in cooperation with a mail server, or it may monitor external cloud storage and automatically acquire newly saved files.
[0032] The estimation unit 212, as an estimation step, estimates the procedure category corresponding to the acquired data file based on the characteristics of the text data contained in the acquired data file or the image data of the data file. Here, "estimation" includes at least one of the following: uniquely identifying from the input data, determining the classification based on statistical confidence, and selecting a specific category from multiple candidates.
[0033] The estimation unit 212 estimates the procedure category based on rule-based determination, determination using the AI model 6, or a combination of these determination results. AI model 6 may be a natural language processing model such as an LLM (Large Language Model) or a classifier using an LLM API. For example, AI model 6 is an LLM that takes text in a data file as input and outputs a procedural classification. The estimation unit 212 may acquire the text data contained in the data file by performing OCR (Optical Character Recognition) processing prior to inputting it to AI model 6.
[0034] For example, AI Model 6 is a pre-trained image classifier, document classifier, or multimodal model that has been trained using document files and procedural classifications as training data. As an example, AI Model 6 is an image classifier that includes a CNN (Convolutional Neural Network) and takes document layout image data as input to output a procedural classification.
[0035] AI Model 6 may use a common model for all clients, or individual models (for example, finely tuned models for each client) may be used for each client. Alternatively, AI Model 6 may be a combination of specialized models for each procedural category, such as a tax model or a corporate registration model, and the estimation unit 212 may be configured to use different specialized models depending on the application. The execution environment for AI Model 6 may be cloud, on-premises, or a combination of the above.
[0036] As a sorting step, the sorting unit 213 determines the destination folder for the data file based on the estimated procedure category and the sorting rules predefined in the sorting rule table T2, and stores the data file in that folder. The destination folder may be located in the storage within the information processing device 2, or it may be located on external cloud storage or a file server accessible by the information processing device 2.
[0037] The duplicate detection unit 214, as part of the duplicate detection step, determines whether a file with the same name already exists when storing a data file. The duplicate detection unit 214 may perform duplicate checks not only within the same folder that was determined, but also between folders that have a sibling or parent-child relationship in the folder hierarchy (a range that spans across folders).
[0038] The log generation unit 215 generates and stores log information indicating the basis for estimating the administrative procedure classification as a log generation step. The log information may further include at least one of the following: processing history related to the receipt, classification estimation, and storage of materials, and error information. The log output destination may be a table in a database, a file in CSV or JSON format, or an external audit log service.
[0039] The output control unit 216 implements the functions of the display control unit 217 and the data output control unit 218 as an output step.
[0040] The display control unit 217 controls the display of the document file sorting result screen, the requester information update proposal screen, or a notification pop-up to the employee terminal 4 or the requester terminal 3. Note that the display control unit 217 displaying information on the terminal may mean transmitting screen data showing that information, or transmitting data necessary to generate the screen showing that information to the terminal. Furthermore, displaying information is one form of outputting information.
[0041] The data output control unit 218, as a data output control step, outputs or links the updated database contents or organized folder structure to an external system.
[0042] 3. Operation of Information Processing Device 2 Section 3 will explain the flow of information processing performed by the information processing device 2, with reference to the diagram. The information processing mainly includes pre-registration processing and data sorting processing.
[0043] 3.1 Pre-registration process First, we will explain the overview of the pre-registration process using Information Processing System 1. First, we will explain the rule registration process that takes place before using the system. Figure 5 is a sequence diagram showing an example of the pre-registration process.
[0044] The employee terminal 4 transmits a request for registration of client information (S10). For example, client information may include company name, representative name, whether or not an invoice is registered, and capital. The registration unit 210 of the control unit 21 of the information processing device 2 registers the received client information in the client master T1 (S11).
[0045] Furthermore, the user terminal 4 sends a request to register a sorting rule (S12). For example, a sorting rule includes a rule ID, procedure category, judgment logic, and destination folder path. The registration unit 210 registers these in the sorting rule table T2 (S13).
[0046] 3.2 Database Structure [Client Master T1] Figure 6 shows an example of the data structure of the client master T1. The client master T1 shown in Figure 6 is a table for managing client information, which indicates the basic attributes of the client that are referenced in various procedures such as accounting, tax, registration, or social insurance, using the client ID (T11) as the key. The client master T1 is stored in the storage unit 22. As an example, the client master T1 includes the following items as client information: client ID, company name, representative name, whether or not invoice registration is performed, whether or not blue return filing is performed, capital, and address.
[0047] The client ID (T11) is an identifier that uniquely identifies the client. The company name (T12) is the name of the client, for example, a corporate name or trade name. The representative name (T13) is the name of the representative of the client or the person whose name is on the application form for administrative procedures, etc. The "Invoice Registration Status (T14)" attribute indicates whether or not the item is registered under the invoice system, and can take values such as "Yes" or "No". If the "Invoice Registration Status" is "Yes", this item may include the invoice registration number. The "Blue Return Filing Status (T15)" attribute indicates whether or not the blue return filing system is applicable, and can take values such as "Yes" or "No". Capital (T16) is an attribute that indicates the amount of capital the client has. The address (T17) is an attribute that indicates the location of the client (e.g., the location of the head office or the address).
[0048] Each record (row) in the client master T1 is a set of client information corresponding to one client and can be referenced by the client ID (T11). For example, in the pre-registration process in which an accountant registers client information from the accountant terminal 4, the registration unit 210 of the information processing device 2 stores the company name (T12), representative name (T13), whether or not an invoice is registered (T14), whether or not a blue return is filed (T15), capital (T16), and address (T17), etc., linked to the client ID (T11).
[0049] Furthermore, in the client information update process described later, the information processing device 2 detects changes in client information to be updated by comparing the information extracted from the document file (e.g., capital, address, representative name, invoice registration status, etc.) with the client information stored in the client master T1 (T12-T17). For example, if the capital extracted from the document file differs from the capital in the client master T1 (T16), the information processing device 2 may present update candidates to the accounting staff.
[0050] The types and formats of items stored in the client master T1 are not limited to the example shown in this diagram. For example, they may also include attributes such as telephone number, email address, corporate number, date of establishment, fiscal year, fiscal year end, tax office code, social insurance applicable business establishment number, number of employees, registered head office address, assigned tax accountant, and assigned staff. This can improve the classification of document files (estimation of procedural categories), the accuracy of suggesting update candidates, or the ease of handling audits.
[0051] [Distribution Rule Table T2] Figure 7 shows an example of the data structure of the sorting rule table T2. The sorting rule table T2 is a table for storing sorting rules that the sorting unit 213 refers to, according to the procedure classification estimated by the estimation unit 212 for the document files provided by the client. The sorting rule table T2 includes items such as rule ID (T21), procedure classification, judgment logic, and storage folder path (T17). The procedure classification may have a hierarchical structure such as major classification, medium classification, and minor classification, depending on the attributes of the procedure or the attributes of the documents related to the procedure, and in this figure, these are explained as procedure classification (T22), procedure content (T23), and document type (T24), respectively. The judgment logic includes items such as judgment keyword (T25) and AI model ID (T26).
[0052] The Rule ID (T21) is an identifier that uniquely identifies each sorting rule.
[0053] The procedural category (major classification) (T22) indicates the higher-level classification within the procedural category. In the example shown in this diagram, procedural category (T12) includes tax, social insurance, registration, and other government services. The procedure details (medium classification) (T23) indicate procedures that are further subdivided from the above-mentioned major classifications of procedure categories. In the example in this diagram, the tax medium classifications show invoice-related matters, blue return-related matters, and corporate tax returns, while the social insurance medium classifications show onboarding procedures and calculation basis reports. Additionally, the registration medium classification shows changes in officers, and the other government office medium classification shows license and permit applications.
[0054] The document type (minor classification) (T24) indicates the specific type of document related to the above-mentioned major classification of the procedural category. In the example in this diagram, the document type (minor classification) (T24) includes, as an example, a qualified invoice issuer notification, an invoice, an application for approval of blue return filing, a copy of the final tax return, an employment insurance insured person qualification acquisition notification, a calculation basis notification, a certified copy of the company register, and a business license.
[0055] The determination keyword (T25) is a string or group of strings referenced to identify the contents of the document file. The estimation unit 212 determines, for example, whether the determination keyword (T25) is included in the text extracted by OCR, and if it is included, it may adopt the corresponding classification candidate (T22-T24). In the example in this figure, examples of determination keywords (T25) include "Notification of Registration as a Qualified Invoice Issuer," "Invoice," "Blue Return, Approval Application," "Corporate Tax, Final Tax Return," "Employment Insurance, Qualification Acquisition," "Calculation Basis, Standard Monthly Remuneration," "Certificate of All Registered Matters, Officer," and "Business License, Public Health Center."
[0056] The AI Model ID (T26) is an identifier used to specify the model to be used when estimating the procedure category using AI Model 6. In the example in this figure, different AI Model 6 are shown for each document type (minor category) (T24). However, AI Model 6 may also differ for each procedure category (major category) (T22), for each procedure content (medium category) (T23), or depending on a combination of these.
[0057] The destination folder path (T27) is the path that indicates the location where the document files matching the rule are stored. The destination folder path (T17) defines a folder hierarchy structure that includes a first level corresponding to the procedure category, a second level corresponding to the procedure content which is a subdivision of the procedure category, and a third level corresponding to an even more detailed document type, such as " / {Customer} / Tax / Invoice / Invoice / ". Here, {Customer} is a variable part that identifies the client and can be specified by, for example, the client ID (T11) or company name (T12).
[0058] Each record in this diagram corresponds to one sorting rule. For example, rule ID "R001" targets document files related to "Taxation," "Invoice-related," and "Qualified Invoice Issuer Notification," includes "Qualified Invoice Issuer Registration Notification" as the judgment keyword, references "M_TAX_INV_01" as the AI model ID, and specifies " / {Customer} / Taxation / Invoice / Notification / " as the storage folder path.
[0059] In the example shown in Figure 7, the storage locations are defined using a similar rule structure not only for tax matters but also for social insurance, registration, and other government offices. This allows for classification and storage based on a unified rule system, even when a client submits document files to a single upload window, across the business domains of accounting firms, etc. (e.g., tax, social insurance, registration, or licensing).
[0060] Note that the types of items and value formats in the sorting rule table T2 are not limited to the example shown in this figure. For example, the judgment keyword (T25) is not limited to a sequence of words, but may include a regular expression, destination domain, document number, or conditional expression for layout features. Also, the storage folder path (T27) may be a cloud storage path, a storage path within the information processing device 2, or a URI (Uniform Resource Identifier) pointing to them. Furthermore, the sorting rule table T2 may have attributes such as rule priority, application conditions, confidence threshold, or action in case of duplication (e.g., skip storage, save under a different name, or notify).
[0061] 3.3 Overview of the document sorting process Next, we will explain the data sorting process performed by the information processing device 2. Figure 8 is a flowchart showing an example of the data sorting process.
[0062] First, when the client terminal 3 uploads the document file, the acquisition unit 211 of the information processing device 2 receives it (S20).
[0063] Next, the estimation unit 212 performs text extraction or image analysis on the acquired data file (S21). For example, if the data file is a PDF with embedded text, the estimation unit 212 extracts text from the PDF. If the data file is scanned image data, the estimation unit 212 may extract text using OCR. Also, if the data file is primarily image-based, the estimation unit 212 may perform image classification, layout analysis, or extraction of areas specific to the document. This allows the estimation unit 212 to obtain image features that represent the appearance of the document. Furthermore, the estimation unit 212 may extract metadata from the data file and use it for subsequent estimation processing. For example, the metadata may include the file name, creation date and time, sender address, number of pages, or resolution.
[0064] Next, the estimation unit 212 performs rule-based classification estimation (S22). For example, the estimation unit 212 may compare the determination keywords (T25) stored in the sorting rule table T2 with the text or analysis results extracted in S21 to estimate the procedure classification (major classification) (T22), procedure content (medium classification) (T23), and document type (minor classification) (T24). The determination is not limited to simple exact matches, but may take into account regular expressions, thesauruses, spelling variation absorption, keyword weighting, or occurrence location (e.g., header, title, or note).
[0065] Next, the estimation unit 212 performs classification estimation processing using the AI model 6 (S23). For example, the estimation unit 212 selects the AI model 6 based on the AI model ID (T26) stored in the sorting rule table T2, and estimates the procedure classification using the input information, including the text or image features obtained in S21, as input. At this time, the estimation unit 212 may estimate the procedure classification of the major category, the procedure content of the medium category, and the document type of the minor category.
[0066] Next, the duplicate determination unit 214 performs a duplicate determination between the file whose classification has been estimated and an existing file (S24). The duplicate determination will be described in detail later.
[0067] Next, the sorting unit 213 determines the storage folder for the document files (S25). For example, the sorting unit 213 identifies a storage folder path (T27) from the sorting rule table T2 that matches the estimation results of S22 and S23 (e.g., estimation results for major, medium, and minor classifications), and determines the storage folder. The actual storage folder may be a folder in the accounting firm's cloud storage, a storage area in the information processing device 2, or both. The storage folder path (T27) may include format examples such as "{Customer} / Tax / Invoice / Notification". Variable parts such as {Customer} may be specified by the client ID (T11) in the client master T1.
[0068] In the estimation steps in S22-S23, if the target document file does not match the sorting rules, if the confidence level of the estimation is low, or if there are multiple candidates for the storage destination, the "Undetermined" folder may be used as the storage destination. That is, if the sorting unit 213 cannot determine the storage destination folder, it stores the document file in the "Undetermined" folder. This helps to suppress folder storage errors due to misclassification.
[0069] Finally, the distribution unit 213 stores the target data files in the storage location, and the log generation unit 215 outputs a log (S26). Specifically, the distribution unit 213 stores the data files in the storage location folder determined in S25, and the log generation unit 215 outputs the storage result as a log. Information to be recorded in the log includes whether the storage was successful or unsuccessful, the result of the duplicate detection, the applicable rule ID, the estimated classification result, the AI model ID used, the storage location path, and the timestamp. The output destination of the log generation unit 215 is basically saved to the log table in the database, but it may also be a CSV or JSON file, an audit log service, or an external SIEM (Security Information and Event Management).
[0070] The output control unit 216 may send a completion notification or a results screen to the accounting staff terminal 4 or the requester terminal 3. If the classification estimation result is undeterminable, the output control unit 216 may send an alert to the staff terminal 4 or the requester terminal 3. Alternatively, the output control unit 216 may display an error message on the system screen.
[0071] The sorting process is now complete.
[0072] [effect] In this way, the information processing device 2 analyzes the contents of uploaded document files based on sorting rules, identifies the procedural category, and automatically sorts the document files into the appropriate folders. This significantly reduces the effort required for the requester or accountant to manually organize files, and also reduces the risk of storage errors, sorting inconsistencies, or loss due to lack of knowledge. Therefore, the information processing device 2 can improve the efficiency of organizing documents related to administrative procedures. As a result, the time spent searching for document files can be reduced.
[0073] Since the procedural classification is a classification of administrative procedure units, it becomes easier to organize things on an administrative procedure unit basis, which can reduce the time accounting staff spend searching or sorting.
[0074] The information processing device 2 recognizes not only the administrative procedure category corresponding to the document file, but also subdivided information such as the procedure content obtained by further subdividing that procedure category, or the document type associated with that procedure content, and generates structured recognition results (e.g., major category / medium category / minor category). Based on the structured recognition results, the information processing device 2 determines the appropriate folder location in the folder hierarchy structure and automatically sorts and stores the document file. As a result, even if the client does not have a full understanding of the document types, the files can be organized according to the folder hierarchy used by accounting firms, etc. (e.g., procedure category, procedure content, and document type), which can help reduce mis-storage, search burden, and delays in subsequent processes (e.g., verification, matching, or processing requests).
[0075] The structure of these allocation destinations or allocation rules can be freely registered or modified by the accountant. Therefore, the allocation rules to folders can be customized with firm-specific rules or client-specific rules, allowing for classification tailored to the firm's operations and client characteristics. This thus enhances practical usability.
[0076] Furthermore, the information processing device 2 detects duplication and performs processing based on the detection results. This allows the information processing device 2 to suppress duplicate submission, duplicate processing, or storage bloat of documents.
[0077] Furthermore, the information processing device 2 generates log information that shows the basis for the estimated administrative procedure classification. This allows the accountant or client to easily check the results of sorting the document files. It can also be used by the accountant as explanatory material for audit responses.
[0078] In the example above, duplicate detection was performed before storing the data files, but duplicate detection may also be performed after storing the data files. In this case, the duplicate detection unit 214 may delete duplicate data files if it detects duplicates.
[0079] 3.4 Details of the document sorting process [Step to obtain data files (S20)] In S20, when the client terminal 3 uploads a document file, the acquisition unit 211 of the information processing device 2 receives it. Here, the document file can be, for example, a PDF, image data, text data such as office documents, email body, attachment, or compressed file. The document file may be acquired triggered by the client terminal 3's upload operation, email forwarding, or registration of a shared link.
[0080] Figure 9 shows an example of the document upload screen 70. The document upload screen 70 displays, for example, a heading "Document Upload" along with a message prompting the user to upload documents.
[0081] In the center of the document upload screen 70, a document file input area 701 is displayed. The requester can upload document files (e.g., receipts, invoices, contracts, etc.) by dragging and dropping them into this input area 701. Document files can also be uploaded by selecting them in the file selection dialog.
[0082] The information processing device 2, triggered by the upload operation performed by the requester, retrieves the document file from the requester's terminal 3 (S20) and transitions the processing to the next step.
[0083] [Estimated Steps (S21~S23)] In S21, the estimation unit 212 performs text extraction from the acquired data file, and in S22-S23, the estimation unit 212 performs classification estimation processing. These steps are performed in accordance with the acquisition of the data file from the client terminal 3, but may also be performed by acquiring the data file uploaded by the client terminal 3 and temporarily stored in the storage unit 22 in response to an operation by the person in charge terminal 4. In the classification estimation processing, the estimation unit 212 may estimate the procedure classification based on the rule-based determination result and the determination result using the AI model 6.
[0084] For example, the estimation unit 212 may prioritize the rule-based determination result over the determination result by the AI model 6. As an example, if the estimation unit 212 can make a rule-based determination, it may determine that determination result as the procedure category. If the estimation unit 212 cannot make a rule-based determination, it may make a determination using the AI model 6. If the rule-based determination cannot be made, it may be the case that the category cannot be estimated by the rule-based determination, or that multiple categories can be estimated. If the estimation unit 212 can make a determination using the AI model 6, it may determine the determination result by the AI model 6 as the procedure category. If the estimation unit 212 cannot make a determination using the AI model 6, it may determine the procedure category as "undeterminable". If the determination cannot be made, it may be the case that the confidence level of the result is less than a predetermined value, or that multiple categories can be estimated.
[0085] Alternatively, the estimation unit 212 may prioritize the determination result from the AI model 6 over the determination result from the rule-based method.
[0086] Here, in the rule-based classification estimation step (S22), the estimation unit 212 may estimate the procedure classification based on at least one of the destination information and the items included in the document file. Destination information is information indicating the destination of the administrative procedure, and items are items used in that administrative procedure. For example, suppose the document file is a text-embedded PDF, and the estimation unit 212 extracts a string from the PDF, and the string contains the words "Director of Tax Office A" or "National Tax Agency" as destination information, and the words "Qualified Invoice Issuer," "Invoice Registration Number," or "Registration Date" as items included. In this case, the estimation unit 212 may estimate the document file as an administrative procedure classification (procedure classification, procedure content, document type) corresponding to "Tax / Invoice Related / Notification."
[0087] The estimation unit 212 may use both the recipient information and the listed items for estimation, or it may use only one of them. For example, even if recipient information cannot be extracted, the estimation unit 212 may estimate the procedure category, procedure content, or document type from the listed items alone. For example, if recipient information cannot be extracted, this may be the case when the recipient field is missing in the image. Conversely, even if there are few listed items, the estimation unit 212 may estimate the procedure category, procedure content, or document type from the recipient information.
[0088] Furthermore, in the classification estimation step (S23) using the AI model 6, the estimation unit 212 may use the AI model 6 to perform semantic estimation based on destination information, listed items, or combinations thereof. For example, the AI model 6 may estimate the procedure classification, procedure content, or document type by considering variations in the notation of the extracted string, synonyms, and context (e.g., the relationship between "registration notice," "qualified invoice," and "taxable business operator"). Also, even if destination information is missing, estimation may be made based on the semantic relationships of the listed items, and conversely, even if there are few listed items, estimation may be made based on the meaning of the destination information (e.g., the name of the administrative agency).
[0089] Thus, the estimation unit 212 can utilize semantic estimation by the AI model 6 in addition to rule-based estimation based on destination information or listed items. As a result, even with document files containing variations in notation, synonyms, or missing destination fields, the accuracy of estimating administrative procedure categories is improved, and automatic sorting to the appropriate folder can be reliably achieved. Consequently, storage errors and search burdens can be reduced in some cases.
[0090] [Duplicate detection step (S24)] In S24, the duplicate detection unit 214 performs a duplicate detection test between the file whose classification has been estimated and an existing file. Here, "duplicate" basically refers to a case where there is an exact match with a file that already exists in the same storage location (or a predetermined search range). The determination of an exact match is not limited to matching file names, but may also be performed by verifying the identity between the hash value of the target data file and the hash value of a previously saved file. For example, the hash function can be MD5 (Message Digest Algorithm 5) or SHA-256 (Secure Hash Algorithm 256-bit). The duplicate detection unit 214 may also perform a determination based on the file content. For example, the duplicate detection unit 214 may treat substantially identical files as duplicates based on the similarity of their content. Specifically, the duplicate detection unit 214 may determine that files are substantially identical if the similarity of their content is greater than or equal to a predetermined value. The similarity of their content may be estimated based on string difference extraction, the distance of image features (e.g., cosine similarity), or text similarity. For example, the duplicate detection unit 214 may recognize differences resulting from minor modifications to the entry items or differences in scanning conditions as substantially identical (for example, rescanning the same invoice). Also, for example, the duplicate detection unit 214 may include cases where the entry items have been modified for documents with the same layout as substantially identical.
[0091] Furthermore, the duplicate detection unit 214 may perform duplicate checks across folders. For example, duplicate checks may be performed by cross-searching between folders in the same series (parent-child relationship), between folders at the same level (sibling relationship), between candidate destination folders, or on a per-request basis.
[0092] In other words, the duplicate detection unit 214 detects duplicate files as identical or substantially identical files, targeting data files stored in or scheduled to be stored in the determined storage destination folder or in a folder that meets predetermined conditions for the storage destination folder.
[0093] If a duplicate is detected, the duplicate detection unit 214 may notify (alert) the accountant. The duplicate detection unit 214 may also perform processes such as adding a tag indicating the duplicate to the file name, saving it under a different name with a date and time added, or skipping storage, storing a shortcut that references the duplicate file, saving it under a different name (for example, adding "v2" to the end), or adding version information to the metadata.
[0094] 3.5 Processing to update client information Here, in parallel with or after the document sorting process, the registration unit 210 of the information processing device 2 may update the client information based on the contents of the document file as a client information update step.
[0095] Specifically, as a client information update step, the registration unit 210 detects changes in client information to be updated by comparing the client information stored in the client master T1 with the information extracted from the document file. The registration unit 210 then presents the accounting staff responsible for accounting processing with candidates for client information to be updated, or updates the client information registered in the client master.
[0096] Figure 10 is a flowchart showing an example of the client information update process. Each process shown in this figure is used by the registration unit 210 of the control unit 21 of the information processing device 2 to generate update candidates for client information registered in the client master T1 from the sorted document files and present them to the accounting staff.
[0097] When the registration unit 210 starts the process of updating client information, it first extracts information on predetermined target items from the sorted document files (S30). Here, the target items are at least one of the items included in the client master T1. For example, the target items are company name, capital, representative name, address, whether or not an invoice is registered, or whether or not a blue return is filed. The extraction may be performed, for example, by keyword extraction from the text obtained from the document files, label detection of standard items, regular expressions, or information extraction using AI model 6.
[0098] Next, the registration unit 210 compares the extracted information (extracted information) with the registered information of the target items in the client master T1 (S31). For example, the registration unit 210 identifies the target client by client ID (T11) and matches the registered values in the client master T1 corresponding to that client (e.g., capital (T16) and address (T17)) with the values extracted in S30, item by item. The matching is not limited to exact match determination, but may also be performed after normalization processing that absorbs variations in notation, abbreviated address notation, or digit separators in numbers.
[0099] Next, the registration unit 210 determines whether there are any differences in the information of the target items (S32). That is, the registration unit 210 determines whether there are any differences between the registered information and the extracted information in the client master T1 based on the comparison results for each item. If there are no differences (NO in S32), the registration unit 210 may terminate the process without presenting update candidates based on the data file.
[0100] On the other hand, if a difference exists (YES in S32), the registration unit 210 presents the accounting officer with update candidates (S33).
[0101] Figure 11 shows an example of the administration screen 71 and notification display. On the left side of the administration screen 71 is a side menu 711 that displays the folder hierarchy. In the side menu 711, for example, "All Documents" is at the top level, and procedural categories (major classifications) such as "Tax," "Social Insurance," "Registration," and "Undeterminable" may be displayed. Further sub-categories (minor classifications) such as "Invoice-related" and "Blue Return-related" may be displayed under these major classifications. Accountants can switch the storage location of the documents they want to view by selecting each folder in the side menu 711.
[0102] In the upper right corner of the management screen 71, there is a file list area 712 that displays a list of document files stored in the selected folder. The file list area 712 displays, for example, the file name, upload date, and processing status (e.g., awaiting confirmation or processed).
[0103] A notification window 713 that notifies the user of update candidates is displayed in the center right of the management screen 71. The registration unit 210 may display in the notification window 713 the file name of the document that was detected as an update candidate, the name of the item whose change was detected (for example, the amount of capital), the current value registered in the client master T1, and the detected value extracted from the document file. Furthermore, the registration unit 210 may accept an action from the accountant to update the client master T1 or to postpone (ignore) the update.
[0104] Furthermore, the notification window 713 displays action buttons (e.g., "Update" or "Ignore") that accept input from the accountant. If the accountant selects "Update," the registration unit 210 may update the corresponding item in the client master T1 (e.g., capital (T16)) with the detected value. On the other hand, if the accountant selects "Ignore," the registration unit 210 may not perform the update based on the difference, but instead close the notification or record it as a state that can be reviewed later.
[0105] Thus, the management screen 71 allows for viewing document files based on a folder hierarchy and displaying update candidates for the client master T1 on the same screen. As a result, accounting personnel can sometimes quickly determine whether or not to update client information while reviewing the document files that serve as the basis for the update.
[0106] The registration unit 210 may automatically update the corresponding items in the client master T1 upon detecting a change, without requiring confirmation and approval from the accountant. The registration unit 210 may also automatically update the client master T1 if the certainty of the change meets a predetermined condition (e.g., above a predetermined threshold) or if it is a certain type of highly reliable public document (e.g., a certified copy of the registration). If an automatic update is performed, the registration unit 210 may subsequently notify the accountant of the change.
[0107] As a result, the registration unit 210 can provide updated client information candidates based on the classified document files, thereby maintaining the freshness of the client master T1 information and contributing to improved accuracy in subsequent sorting processes or client information management operations.
[0108] While the present disclosure has been described above with reference to embodiments, it is not limited thereto. Various modifications to the structure and details of the present disclosure are possible, as can be understood by those skilled in the art within the scope of the invention.
[0109] For example, in the embodiment described above, the client master T1 and the distribution rule table T2 are stored in the storage unit 22, but they may be stored in an external device accessible by the information processing device 2.
[0110] Furthermore, in the classification estimation step, the estimation unit 212 may combine physical or analog means with automatic determination by the AI model 6. For example, for materials with a low confidence level for determination by the AI model 6, the estimation unit 212 may send image data to a terminal of an operator (human) in a secure environment, accept visual classification input, and use the results as training data to retrain the AI model 6, thus establishing a cycle.
[0111] Furthermore, although the above-described embodiment described a case where the information processing device 2 is a single device, some or all of the components of the information processing device 2 may be realized by multiple information processing devices or circuits. In this case, the multiple information processing devices or circuits may be centrally located or distributed. For example, the information processing devices or circuits may be realized in a form in which each is connected via a communication network, such as a client-server system or a cloud computing system. In addition, the functions of the information processing device 2 may be provided in SaaS (Software as a Service) format.
[0112] Some or all of the above embodiments may also be described as follows, but are not limited to the following: (Note 1) An information processing system comprising one or more processors and memory, The one or more processors mentioned above are capable of performing the following steps: In the acquisition step, the document file is acquired from the client's terminal operated by the client. In the estimation step, the administrative procedure category corresponding to the acquired document file is estimated based on the characteristics of the text data or image data contained in the acquired document file. In the sorting step, the document files are stored in the destination folder determined according to the administrative procedure category. Information processing system. (Note 2) The aforementioned administrative procedure categories include at least one of the following: taxation, social insurance, and registration. The information processing system described in Appendix 1. (Note 3) The aforementioned processor 1 or more In the estimation step described above, an artificial intelligence model trained with document files and administrative procedure classifications as training data is used to estimate the administrative procedure classification from the document files obtained from the client. The information processing system described in Appendix 1 or 2. (Note 4) The aforementioned processor 1 or more In the estimation step, the administrative procedure classification is estimated based on at least one of the recipient information and the items included in the document file. An information processing system as described in any one of the items 1 to 3 of the appendix. (Note 5) The aforementioned destination folder is a folder in a folder hierarchy structure that includes a first level corresponding to the administrative procedure category and a second level corresponding to the procedure content subdivided from the administrative procedure category. The aforementioned processor 1 or more In the estimation step described above, the contents of the procedure are further estimated. An information processing system as described in any one of the items 1 to 4 of the appendix. (Note 6) The aforementioned one or more processors are capable of performing the registration step further, In the registration step described above, the allocation rules obtained from the terminal operated by the accounting staff are registered. In the sorting step, the destination folder is determined by applying the sorting rule to the administrative procedure category estimated in the estimation step. An information processing system as described in any one of the items 1 to 5 of the appendix. (Note 7) The aforementioned processor 1 or more If the destination folder cannot be determined in the sorting step, the data file is stored in the "Undetermined" folder. An information processing system as described in any one of the items 1 to 6 of the appendix. (Note 8) The one or more processors described above can further perform the duplicate determination step, In the duplicate detection step, the system detects duplicate files by targeting data files stored in or planned to be stored in the destination folder or in a folder that meets predetermined conditions for the destination folder, and identifying identical or substantially identical data files as duplicate data. An information processing system as described in any one of the items 1 to 7 of the appendix. (Note 9) The aforementioned one or more processors are capable of further executing the log generation step, In the log generation step, log information is generated that shows the basis for estimating the administrative procedure classification. An information processing system as described in any one of the items 1 to 8 of the appendix. (Note 10) The aforementioned processor 1 or more can further execute the client information update step, In the aforementioned client information update step, changes to the client information to be updated are detected by comparing the client information stored in the client master with the information extracted from the document file. To provide the accounting staff responsible for accounting processing with a list of potential client information to be updated, or to update the client information registered in the client master. An information processing system as described in any one of the items 1 through 9 of the appendix. (Note 11) An information processing method comprising each step performed by the information processing system described in any one of the appendices 1 to 10. (Note 12) A program for causing a computer having one or more processors and memory to perform each of the steps described in any one of the appendices 1 to 10. [Explanation of Symbols]
[0113] 1. Information Processing System 2. Information Processing Device 3. Client terminal 4. Terminal of the person in charge 6. Artificial Intelligence Models (AI Models) 21 Control Unit 22 Memory section 23 Communications Department 26 Communications Bus 31 Control Unit 32 Storage section 33 Communications Department 34 Input section 35 Display section 36 Communications Bus 41 Control Unit 42 Storage section 43 Communications Department 44 Input section 45 Display section 46 Communications Bus 70. Document Upload Screen 71 Management screen 210 Registration Department 211 Acquisition Department 212 Estimation Department 213 Distribution Section 214 Duplication determination section 215 Log generation unit 216 Output Control Unit 217 Display Control Unit 218 Data Output Control Unit NW Network T1 Client Master T2 Distribution Rule Table
Claims
1. An information processing system comprising one or more processors, memory, and one or more programs, The one or more programs mentioned above are stored in the memory. The one or more processors execute the one or more programs, In the acquisition step, the document file is acquired from the client's terminal operated by the client. In the estimation step, the administrative procedure category is estimated based on at least one of the destination information and the items included in the acquired document file. In the sorting step, the document files are stored in the destination folder determined according to the administrative procedure category. Information processing system.
2. An information processing system comprising one or more processors, memory, and one or more programs, The one or more programs mentioned above are stored in the memory. The one or more processors execute the one or more programs, In the acquisition step, the document file is acquired from the client's terminal operated by the client. In the estimation step, the administrative procedure category corresponding to the acquired document file is estimated based on the characteristics of the text data or image data contained in the acquired document file. In the sorting step, the document files are stored in the storage folder determined according to the administrative procedure category. In the client information update step, the system detects changes to the client information to be updated by comparing the client information stored in the client master with the information extracted from the document file, and then presents the accounting staff responsible for accounting processing with candidates for client information to be updated, or updates the client information registered in the client master. Information processing system.
3. The aforementioned administrative procedure categories include at least one of the following: taxation, social insurance, and registration. The information processing system according to claim 1 or 2.
4. The one or more processors mentioned above are: In the estimation step described above, an artificial intelligence model trained with document files and administrative procedure classifications as training data is used to estimate the administrative procedure classification from the document files obtained from the client. The information processing system according to claim 1 or 2.
5. The aforementioned destination folder is a folder in a folder hierarchy structure that includes a first level corresponding to the administrative procedure category and a second level corresponding to the procedure content subdivided from the administrative procedure category. The one or more processors mentioned above are: In the estimation step described above, the contents of the procedure are further estimated. The information processing system according to claim 1 or 2.
6. The one or more processors mentioned above are capable of further performing the registration step, In the registration step described above, the allocation rules obtained from the terminal operated by the accounting staff are registered. In the sorting step, the destination folder is determined by applying the sorting rule to the administrative procedure category estimated in the estimation step. The information processing system according to claim 1 or 2.
7. The one or more processors mentioned above are: If the destination folder cannot be determined in the sorting step, the data file is stored in the "Undetermined" folder. The information processing system according to claim 1 or 2.
8. The aforementioned one or more processors further In the duplicate detection step, the system detects duplicate files by targeting data files stored in or planned to be stored in the destination folder or in a folder that meets predetermined conditions for the destination folder, and identifying identical or substantially identical data files as duplicate data files. The information processing system according to claim 1 or 2.
9. The aforementioned one or more processors further In the log generation step, log information is generated that shows the basis for estimating the administrative procedure category. The information processing system according to claim 1 or 2.
10. An information processing method comprising each step performed by one or more processors according to claim 1 or 2.
11. A program for causing a computer, which has one or more processors and memory, to perform each of the steps described in claim 1 or 2.