Marine observation data entry auditing method, device and equipment and readable storage medium
By using a multimodal large model and knowledge graph-assisted automated review method, the problem of low efficiency in manual data entry and review was solved, enabling efficient and accurate review of marine observation data and improving the data fusion analysis and value mining capabilities.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 汉江国家实验室
- Filing Date
- 2026-04-27
- Publication Date
- 2026-06-30
AI Technical Summary
In existing technologies, manual entry and review of paper-based marine observation data is inefficient, makes it difficult to detect differences in data format and entry errors, and affects the fusion analysis and value mining of historical marine observation data.
A multimodal large model is used to identify inconsistencies in paper text content. Through preset data parsing rules and normalization processing, it automatically identifies and distinguishes format differences or numerical differences, and combines knowledge graphs for logical verification to generate an inconsistency audit report.
The entire process of marine observation data entry has been automated, which has improved review efficiency, reduced manual workload, improved data accuracy and reliability, and shortened the review cycle.
Smart Images

Figure CN122309500A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of marine data processing technology, and in particular to a method, apparatus, equipment, and readable storage medium for marine observation data entry and verification. Background Technology
[0002] Marine data processing refers to the collection, organization, quality control, and value extraction of marine observation, survey, and historical data.
[0003] Currently, the common practice is to manually input handwritten marine observation data on paper into electronic form, and then manually review the input data. This process is inefficient and makes it difficult to detect problems such as data format differences and data input errors, which seriously restricts the integration, analysis and value mining of historical marine observation data. Summary of the Invention
[0004] This application provides a method, apparatus, device, and readable storage medium for the entry and review of marine observation data. It aims to solve the technical problems that currently, the common practice of manually entering electronic data from handwritten paper marine observation data and then manually reviewing the entered data results in low review efficiency and difficulty in detecting problems such as data format differences and data entry errors, which seriously restricts the fusion analysis and value mining of historical marine observation data.
[0005] In a first aspect, embodiments of this application provide a method for verifying marine observation data entry, the method comprising: Select two data entries, where each data entry is obtained by different data entry personnel from the same paper text content through electronic input; Each piece of entered data is parsed using preset data parsing rules to obtain each field and its corresponding content. For each field, if the absolute value of the difference between the normalized values of the field content of two entered data is greater than the threshold corresponding to the field, then the field is regarded as an inconsistency item. For each inconsistency, the field content of the inconsistency in the paper text is identified by a multimodal large model, and the identified field content is compared with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference.
[0006] Optionally, after identifying the field content of the inconsistency in the paper text content using a multimodal large model for each inconsistency, and comparing the identified field content with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference, the process includes: If the inconsistency is due to format differences, then the non-standard format fields in the two entered data entries will be converted to the standard format.
[0007] Optionally, after identifying the field content of the inconsistency in the paper text content using a multimodal large model for each inconsistency, and comparing the identified field content with the field content of each entered data entry to determine whether the inconsistency is a format difference or a numerical difference, the method further includes: If the inconsistency is a numerical difference, then the preset knowledge graph is queried based on the field of the inconsistency to obtain the benchmark range corresponding to the field; If the content of any field in any entered data exceeds the baseline range, an alert will be issued.
[0008] Optionally, after identifying the field content of the inconsistency in the paper text content using a multimodal large model for each inconsistency, and comparing the identified field content with the field content of each entered data entry to determine whether the inconsistency is a format difference or a numerical difference, the method further includes: For each inconsistency, the credibility score for each data entry is obtained by multiplying it by the corresponding weight based on the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the multimodal large model for each data entry.
[0009] Optionally, after calculating the credibility score for each inconsistency by multiplying the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the data entry personnel for each data entry by their respective weights and summing them, the following steps are included: For each inconsistency, a review report is generated by combining the content of the paper text, the credibility score of each entered data, and the identification results of the multimodal large model.
[0010] Optionally, after generating a discrepancy review report for each inconsistency by comprehensively considering the content of the paper text, the credibility score of each entered data, and the recognition results of the multimodal large model, the report includes: Obtain the data from the inconsistency audit report after review and judgment; The historical data entry accuracy rate of the data entry personnel is updated based on the data after review and judgment. The preset knowledge graph is updated based on the correct data after review and verification. Training samples are constructed based on the correct data after review and verification. If the number of training samples is greater than the preset number, the training samples will be used to incrementally train the multimodal large model.
[0011] Optionally, the fields include core fields, key fields, and auxiliary fields, and the thresholds corresponding to the core fields, key fields, and auxiliary fields are different.
[0012] Secondly, embodiments of this application provide a marine observation data entry and review device, the marine observation data entry and review device comprising: The selection module is used to select two data entries, where each data entry is obtained by different data entry personnel through electronic entry of the same paper text content; The parsing module is used to parse each piece of entered data using preset data parsing rules, and to obtain each field and its corresponding content for each piece of entered data; The comparison module is used to identify inconsistencies in a field if the absolute value of the difference between the normalized values of the fields in two entered data entries is greater than the threshold corresponding to that field. The determination module is used to identify the field content of the inconsistency in the paper text content for each inconsistency using a multimodal large model, and compare the identified field content with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference.
[0013] Thirdly, this application provides a marine observation data entry and review device, which includes a processor, a memory, and a marine observation data entry and review program stored in the memory and executable by the processor. When the marine observation data entry and review program is executed by the processor, it implements the steps of the marine observation data entry and review method described above.
[0014] Fourthly, embodiments of this application provide a readable storage medium storing a marine observation data entry and review program, wherein when the marine observation data entry and review program is executed by a processor, it implements the steps of the marine observation data entry and review method as described above.
[0015] The beneficial effects of the technical solutions provided in this application include: In this embodiment, two sets of data are selected, each set being obtained by different data entry personnel digitizing the same paper text content. Each set of data is parsed using preset data parsing rules to obtain each field and its corresponding content. For each field, if the absolute value of the difference between the normalized values of the field content of the two sets of data is greater than the threshold corresponding to the field, the field is considered an inconsistency. For each inconsistency, a multimodal large model is used to identify the field content of the inconsistency in the paper text content, and the identified field content is compared with the field content of each set of data to determine whether the inconsistency is a format difference or a numerical difference. In this embodiment, if data entry personnel A and B enter the same paper text (which records marine observation data), resulting in two sets of data, firstly, the two sets of data are parsed using preset data parsing rules to obtain each field and its corresponding content. Then, for each field, the content is normalized. If the difference between the normalized values of the two sets of data is too large, it indicates that the content of that field entered by A and B is inconsistent. Furthermore, the inconsistent field content in the paper text is identified using a multimodal large model. The identified field content is compared with the field content of each set of data to determine whether the inconsistency is a format difference or a numerical difference. Thus, the review of the entered data is fully automated, not only identifying inconsistencies but also distinguishing between format differences and numerical differences, significantly improving the efficiency of the data review process. Attached Figure Description
[0016] Figure 1 This is a flowchart illustrating an embodiment of the marine observation data entry and review method of this application; Figure 2 This is a schematic diagram of the field importance classification rule base of an embodiment of the marine observation data entry and review method of this application; Figure 3 This is a schematic diagram of a marine spatiotemporal knowledge graph, representing an embodiment of the marine observation data entry and review method of this application. Figure 4 This is a schematic diagram of the functional modules of an embodiment of the marine observation data entry and review device of this application; Figure 5 This is a schematic diagram of the hardware structure of the marine observation data entry and review equipment involved in the embodiments of this application. Detailed Implementation
[0017] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present application.
[0018] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0019] Firstly, this application provides a method for verifying and entering marine observation data.
[0020] In one embodiment, reference is made to Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of the marine observation data entry and review method of this application, as shown below. Figure 1 As shown, the methods for reviewing and verifying marine observation data include: Step S10: Select two data entries, where each data entry is obtained by different data entry personnel electronically entering the same paper text content.
[0021] In this embodiment, the process of selecting two data entries is as follows: A scanned image of the same paper-based marine observation text (e.g., a seabed sediment data record sheet from a certain voyage) is obtained and assigned to at least two different data entry personnel (e.g., data entry personnel A and B) for independent electronic data entry. The reason for using this method of two people independently entering the same source text is based on the logical principle of redundancy verification. The independence of personnel operations is used to offset random errors or habitual mistakes that may occur from a single individual. This establishes the foundation for data comparison from the source, avoiding the problem of errors going undetected due to a single data entry source, and reducing the pressure of subsequent review.
[0022] Step S20: Use preset data parsing rules to parse each piece of entered data to obtain each field and its corresponding field content.
[0023] In this embodiment, the preset data parsing rules are stored in a pre-established marine handwritten data parsing rule base, which was trained using no fewer than 10,000 handwritten samples from different eras. The parsing process includes: identifying field boundaries in the input text, extracting field names and their corresponding original content, and initially marking units, date formats, etc. For example, '2023.01.01' and '2023 / 01 / 01' are marked as different variations of the same date format. The purpose of structured parsing through preset rules is to transform unstructured natural language text into standardized field objects that can be processed by computers, eliminate surface format differences caused by different input habits, provide a unified data foundation for subsequent numerical comparison, improve the automation level of data processing, and make machine comparison possible.
[0024] Step S30: For each field, if the absolute value of the difference between the normalized values of the field content of two entered data is greater than the threshold corresponding to the field, then the field is regarded as an inconsistency item.
[0025] In this embodiment, field content normalization refers to converting field content in different units or formats into a unified standard value. For example, converting latitude and longitude in degrees, minutes, and seconds format into decimal degrees. (Refer to...) Figure 2 , Figure 2 This is a schematic diagram of the field importance classification rule base of an embodiment of the marine observation data entry and review method of this application, as shown in the figure. Figure 2 As shown, the threshold settings are based on a field importance classification rule base. Fields are divided into core fields (such as longitude and latitude), key fields (such as salinity and temperature), and auxiliary fields (such as remarks). When comparing the absolute value of the difference between the normalized values of two entered data fields with the corresponding threshold, the threshold for core fields is the smallest, followed by key fields, and then auxiliary fields. The smaller the threshold corresponding to the difference between the two, the higher the consistency requirement. This is because different fields have different weights in influencing data quality; even a small error in a core field can lead to significant analytical bias, thus requiring a stricter threshold. By setting different thresholds for fields of different importance, differentiated and precise review is achieved. This ensures the accuracy of key data while avoiding over-review of non-key data, balancing review efficiency and quality, and preventing missed detections or false alarms caused by a "one-size-fits-all" approach.
[0026] Step S40: For each inconsistency, the field content of the inconsistency in the paper text is identified by a multimodal large model, and the identified field content is compared with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference.
[0027] In this embodiment, the multimodal large model is preferably a multimodal large model fine-tuned from marine data corpus. This model receives scanned images of the paper text content and a list of inconsistencies as input. The model extracts visual features from the images (such as handwriting blurriness and alteration marks) using a convolutional neural network and combines them with text semantic features for fusion analysis. Machine vision can replicate the human eye's ability to judge original documents, rather than simply comparing digitized text. This allows for accurate differentiation between format differences (e.g., 'May' vs. '05') caused by different understandings of the data entry personnel, and between actual numerical errors (e.g., '5' vs. '6'). This solves the pain point of traditional methods being unable to trace the original documents, and can be used to assist in judging the accuracy of inconsistency determination in step S30. It can also automatically distinguish between format differences and numerical differences, significantly improving the efficiency of data entry review.
[0028] It's easy to understand that many manually entered marine observation data contain identical numerical values such as date, latitude / longitude, and salinity, but with inconsistent formats. For example, there are format differences that have no impact: only differences in character format, punctuation, and spaces, without affecting the business meaning (e.g., "12.5℃" vs. "12.5 ℃"); and business-equivalent format differences: different expressions but conforming to industry standards and having completely equivalent business meanings (e.g., "12.5℃" vs. "12.5 degrees Celsius", "northerly wind force 3" vs. "northerly wind force 3"). This step can directly identify these inconsistencies as format differences, eliminating the need for manual review and significantly improving review efficiency. Subsequent standardized format conversion of these differences will further enhance the accuracy and utilization value of the entered marine observation data.
[0029] In this embodiment, if personnel A and B enter the same paper text content, resulting in two sets of data, firstly, the two sets of data are parsed using preset data parsing rules to obtain each field and its corresponding content. Then, for each field, the content is normalized. If the difference between the normalized values of the two sets of data is too large, it indicates that the content of that field entered by A and B is inconsistent. Different thresholds can be set for fields of different importance to achieve differentiated and accurate review, ensuring the accuracy of key data while avoiding excessive review of non-key data, balancing review efficiency and quality, and preventing missed detections or false alarms caused by a "one-size-fits-all" approach. By identifying inconsistent field content in the paper text content using a multimodal large model, the identified field content is compared with the field content of each set of data to determine whether the inconsistency is due to format differences or numerical differences, thus achieving full automation of the data review process. The multimodal large model is preferably a multimodal large model fine-tuned from marine data corpus. This model takes scanned images of paper text content and a list of inconsistencies as input. Machine vision can replicate the human eye's ability to judge original documents, rather than simply comparing the digitized text. This allows it to accurately distinguish whether the discrepancies are due to different understandings by the data entry personnel or actual numerical errors. This solves the pain point of traditional methods being unable to trace the original documents. It can be used to assist in judging the accuracy of inconsistency determinations and can automatically distinguish whether the inconsistencies are due to format differences or numerical differences, greatly improving the efficiency of data entry review.
[0030] Compared to the traditional multi-person cross-review scheme, the marine observation data entry and review method in this embodiment can significantly reduce the workload of manual review and shorten the review cycle of marine observation data entry. Compared to the traditional optical character recognition dual-recording review scheme, it can significantly improve the review accuracy of core fields, greatly reduce the misjudgment rate of inconsistencies, and effectively eliminate invalid inconsistencies. In particular, the improvement in recognition accuracy is very significant for historical paper archives that are decades old and damaged.
[0031] It should be noted that this embodiment only uses the selection of two data entries for inconsistency review as an example. For more data entries obtained by multiple data entry personnel digitizing the same paper text content, such as five data entry personnel corresponding to five data entries, the method of this embodiment can be used to arbitrarily select two data entries for inconsistency review. Furthermore, in the difference comparison in step S30, for each field, the mean of the normalized values of the field content of the five data entries can be taken first. Then, the difference between the normalized value of the field content of each data entry and the mean of the normalized values of the field content of the five data entries can be compared to review inconsistencies.
[0032] Further, in one embodiment, after step S40, the following is included: If the inconsistency is due to format differences, then the non-standard format fields in the two entered data entries will be converted to the standard format.
[0033] In this embodiment, standard format conversion includes, for example, converting dates to 'YYYY-MM-DD HH:MM:SS' format, standardizing latitude and longitude to decimal degrees, and standardizing salinity units to PSU, etc. The conversion process follows the database storage specifications. A unified data storage format is a prerequisite for data fusion and analysis, ensuring the standardization of the data entering the database, facilitating subsequent marine data querying, statistics, and mining, reducing data cleaning costs caused by format inconsistencies, and improving the interoperability of data resources.
[0034] Furthermore, in one embodiment, after step S40, the method further includes: If the inconsistency is a numerical difference, then the preset knowledge graph is queried based on the field of the inconsistency to obtain the benchmark range corresponding to the field; If the content of any field in any entered data exceeds the baseline range, an alert will be issued.
[0035] In this embodiment, refer to Figure 3 , Figure 3 This is a schematic diagram of a marine spatiotemporal knowledge graph, representing an embodiment of the marine observation data entry and review method of this application. Figure 3 As shown, the preset knowledge graph is a marine spatiotemporal knowledge graph, integrating cruise data, marine hydrological data, and seasonal climate data. Specifically, when querying the benchmark range based on inconsistencies, typical values and reasonable fluctuation ranges under the same conditions can be retrieved from the knowledge graph according to the cruise, sea area, and season of the inconsistency. If the entered value exceeds this range (e.g., abnormally low salinity in the South China Sea during summer), an alert is triggered. By utilizing the spatiotemporal correlation of domain knowledge for logical verification, the shortcomings of simple numerical comparison are compensated for. This allows for the discovery of hidden errors that conform to the input specifications but violate marine scientific principles, improving the effectiveness and usability of the data and preventing erroneous data from contaminating the database.
[0036] Furthermore, in one embodiment, after step S40, the method further includes: For each inconsistency, the credibility score for each data entry is obtained by multiplying it by the corresponding weight based on the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the multimodal large model for each data entry.
[0037] In this embodiment, the confidence score is calculated using the following formula: S = α × P + β × (1 - W) + γ × C, where S is the confidence score, P is the historical accuracy rate of the data entry personnel, W is the normalized value of continuous workload (fatigue factor), and C is the confidence level of the field content identified by the multimodal large model. The weights α, β, and γ are dynamically adjusted according to the field level, with higher weights for historical accuracy rates in core fields. For example, for core fields, α = 0.6, β = 0.3, and γ = 0.1; for key fields, α = 0.5, β = 0.3, and γ = 0.2; and for auxiliary fields, α = 0.4, β = 0.3, and γ = 0.3. By comprehensively considering the data entry personnel's capabilities, machine judgment confidence, and historical performance, a multi-dimensional trust assessment is formed, quantifying the reliability of each piece of data entry. This provides a priority ranking basis for subsequent manual review, enabling review resources to be concentrated on high-risk data and achieving optimized allocation of review resources.
[0038] Further, in one embodiment, after obtaining a credibility score for each inconsistency by multiplying and summing the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the multimodal large model for each data entry by their corresponding weights, the process includes: For each inconsistency, a review report is generated by combining the content of the paper text, the credibility score of each entered data, and the identification results of the multimodal large model.
[0039] In this embodiment, the inconsistency review report can be divided into a detailed version and a simplified version. The detailed version includes original source information, the credibility score calculation process, and highlighted differences, and is suitable for junior reviewers; the simplified version includes a core summary and the final credibility score, and is suitable for senior experts. The content of the inconsistency review report may also include the query and verification results of the marine spatiotemporal knowledge graph. By providing matching information density according to the experience level of different reviewers, the human-machine collaboration process is optimized. Junior personnel can quickly execute based on the detailed report, and experts can quickly make decisions based on the simplified report, thereby improving the overall work efficiency of the review team and reducing labor costs.
[0040] Furthermore, in one embodiment, after generating a discrepancy review report for each inconsistency by comprehensively considering the content of the paper text, the credibility score of each entered data, and the recognition results of the multimodal large model, the process includes: Obtain the data from the inconsistency audit report after review and judgment; The historical data entry accuracy rate of the data entry personnel is updated based on the data after review and judgment. The preset knowledge graph is updated based on the correct data after review and verification. Training samples are constructed based on the correct data after review and verification. If the number of training samples is greater than the preset number, the training samples will be used to incrementally train the multimodal large model.
[0041] In this embodiment, after the inconsistency review report is manually reviewed and judged, in order to achieve closed-loop optimization of the system model, the correct data after review and judgment can be used as new samples. For example, incremental training of the multimodal large model is performed every 2000 accumulated samples. At the same time, the historical accuracy rate of the data entry personnel and the benchmark range in the knowledge graph are updated. By establishing a feedback loop between data production and model optimization, the system has the ability to continuously learn. As the usage time increases, the model's ability to recognize specific handwriting styles and ocean data patterns continuously improves, and the error rate of data entry personnel decreases due to the feedback mechanism. This achieves the self-evolution of system performance and ensures the long-term advanced nature of the technology.
[0042] Furthermore, in one embodiment, the field includes a core field, a key field, and an auxiliary field, with different thresholds corresponding to the core field, the key field, and the auxiliary field.
[0043] In this embodiment, the fields are divided into core fields (such as longitude and latitude), key fields (such as salinity and temperature), and auxiliary fields (such as remarks). When comparing the absolute value of the difference between the normalized values of the fields of two entered data points with the corresponding thresholds, the threshold for core fields is the smallest, followed by the threshold for key fields, and the threshold for auxiliary fields is the largest. The smaller the threshold corresponding to the difference between the two, the higher the consistency requirement. By setting a higher threshold for core fields, inconsistencies in the content of core fields can be detected in a timely manner. Through hierarchical threshold control, while ensuring the high accuracy of core data, the review time of auxiliary fields is reduced, achieving optimal allocation of review resources and meeting the differentiated needs of marine data processing for data of different precision.
[0044] Secondly, embodiments of this application also provide a marine observation data entry and verification device.
[0045] In one embodiment, reference is made to Figure 4 , Figure 4 This is a functional module diagram of an embodiment of the marine observation data entry and review device of this application, as shown below. Figure 4 As shown, the marine observation data entry and verification device includes: The selection module 10 is used to select two data entries, where each data entry is obtained by different data entry personnel through electronic entry of the same paper text content; The parsing module 20 is used to parse each piece of entered data using preset data parsing rules to obtain each field and its corresponding field content for each piece of entered data; The comparison module 30 is used to identify inconsistencies in a field if the absolute value of the difference between the normalized values of the field contents of two entered data points is greater than the threshold corresponding to the field. The determination module 40 is used to identify the field content of the inconsistency in the paper text content for each inconsistency using a multimodal large model, and compare the identified field content with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference.
[0046] Furthermore, in one embodiment, the marine observation data entry and review device further includes a standard format conversion module, used for: If the inconsistency is due to format differences, then the non-standard format fields in the two entered data entries will be converted to the standard format.
[0047] Furthermore, in one embodiment, the marine observation data entry and review device also includes a numerical difference early warning module, used for: If the inconsistency is a numerical difference, then the preset knowledge graph is queried based on the field of the inconsistency to obtain the benchmark range corresponding to the field; If the content of any field in any entered data exceeds the baseline range, an alert will be issued.
[0048] Furthermore, in one embodiment, the marine observation data entry and review device also includes a credibility scoring calculation module, used for: For each inconsistency, the credibility score for each data entry is obtained by multiplying it by the corresponding weight based on the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the multimodal large model for each data entry.
[0049] Furthermore, in one embodiment, the marine observation data entry and review device further includes a review report generation module, used for: For each inconsistency, a review report is generated by combining the content of the paper text, the credibility score of each entered data, and the identification results of the multimodal large model.
[0050] Furthermore, in one embodiment, the marine observation data entry and review device further includes an update module, used for: Obtain the data from the inconsistency audit report after review and judgment; The historical data entry accuracy rate of the data entry personnel is updated based on the data after review and judgment. The preset knowledge graph is updated based on the correct data after review and verification. Training samples are constructed based on the correct data after review and verification. If the number of training samples is greater than the preset number, the training samples will be used to incrementally train the multimodal large model.
[0051] Furthermore, in one embodiment, the field includes a core field, a key field, and an auxiliary field, with different thresholds corresponding to the core field, the key field, and the auxiliary field.
[0052] The functions of each module in the above-mentioned marine observation data entry and review device correspond to the steps in the above-mentioned marine observation data entry and review method embodiment, and their functions and implementation processes will not be described in detail here.
[0053] Thirdly, embodiments of this application provide a marine observation data entry and verification device.
[0054] Reference Figure 5 , Figure 5 This is a schematic diagram of the hardware structure of the marine observation data entry and review device involved in the embodiments of this application. In this embodiment, the marine observation data entry and review device may include a processor, a memory, a communication interface, and a communication bus.
[0055] The communication bus can be of any type and is used to interconnect the processor, memory, and communication interface.
[0056] The communication interface includes input / output (I / O) interfaces, physical interfaces, and logical interfaces used for interconnecting internal components of the marine observation data entry and review equipment, as well as interfaces used for interconnecting the marine observation data entry and review equipment with other devices (such as other computing devices or user equipment). Physical interfaces can be Ethernet interfaces, fiber optic interfaces, ATM interfaces, etc.; user equipment can be displays, keyboards, etc.
[0057] Memory can be various types of storage media, such as random access memory (RAM), read-only memory (ROM), non-volatile RAM (NVRAM), flash memory, optical storage, hard disk, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), etc.
[0058] The processor can be a general-purpose processor, which can call the marine observation data entry and review program stored in the memory and execute the marine observation data entry and review method provided in the embodiments of this application. For example, the general-purpose processor can be a central processing unit (CPU). The method executed when the marine observation data entry and review program is called can be referred to the various embodiments of the marine observation data entry and review method of this application, and will not be repeated here.
[0059] Those skilled in the art will understand that Figure 5 The hardware structure shown does not constitute a limitation of this application and may include more or fewer components than shown, or combine certain components, or have different component arrangements.
[0060] Fourthly, embodiments of this application also provide a readable storage medium.
[0061] This application has a readable storage medium storing a marine observation data entry and review program, wherein when the marine observation data entry and review program is executed by a processor, it implements the steps of the marine observation data entry and review method described above.
[0062] The method implemented when the marine observation data entry and review procedure is executed can be referred to in the various embodiments of the marine observation data entry and review method of this application, and will not be repeated here.
[0063] It should be noted that the sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0064] The terms "comprising" and "having," and any variations thereof, in the specification, claims, and accompanying drawings of this application are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to such process, method, product, or apparatus. The terms "first," "second," and "third," etc., are used to distinguish different objects, etc., and do not indicate a sequence, nor do they limit "first," "second," and "third" to different types.
[0065] In the description of the embodiments of this application, terms such as "exemplary," "for example," or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design described as "exemplary," "for example," or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of terms such as "exemplary," "for example," or "for instance" is intended to present the relevant concepts in a concrete manner.
[0066] In the description of the embodiments of this application, unless otherwise stated, " / " means "or". For example, A / B can mean A or B. The "and / or" in the text is merely a description of the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of this application, "multiple" means two or more.
[0067] In some processes described in the embodiments of this application, multiple operations or steps are included in a specific order. However, it should be understood that these operations or steps may not be executed in the order they appear in the embodiments of this application, or they may be executed in parallel. The sequence number of the operation is only used to distinguish different operations, and the sequence number itself does not represent any execution order. In addition, these processes may include more or fewer operations, and these operations or steps may be executed sequentially or in parallel, and these operations or steps may be combined.
[0068] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) as described above, and includes several instructions to cause a terminal device to execute the methods described in the various embodiments of this application.
[0069] The above are merely preferred embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
Claims
1. A method for inputting and verifying marine observation data, characterized in that, The methods for verifying and entering marine observation data include: Select two data entries, where each data entry is obtained by different data entry personnel from the same paper text content through electronic input; Each piece of entered data is parsed using preset data parsing rules to obtain each field and its corresponding content. For each field, if the absolute value of the difference between the normalized values of the field content of two entered data is greater than the threshold corresponding to the field, then the field is regarded as an inconsistency item. For each inconsistency, the field content of the inconsistency in the paper text is identified by a multimodal large model, and the identified field content is compared with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference.
2. The marine observation data entry and verification method as described in claim 1, characterized in that, For each inconsistency, the process involves using a machine multimodal large model to identify the field content of the inconsistency in the paper text, comparing the identified field content with the field content of each entered data entry, and determining whether the inconsistency is a format difference or a numerical difference. This includes: If the inconsistency is due to format differences, then the non-standard format fields in the two entered data entries will be converted to the standard format.
3. The marine observation data entry and verification method as described in claim 1, characterized in that, After identifying the inconsistencies in the paper text content using a multimodal large model for each inconsistency, and comparing the identified field content with the field content of each entered data entry to determine whether the inconsistency is a format difference or a numerical difference, the process further includes: If the inconsistency is a numerical difference, then the preset knowledge graph is queried based on the field of the inconsistency to obtain the benchmark range corresponding to the field; If the content of any field in any entered data exceeds the baseline range, an alert will be issued.
4. The marine observation data entry and verification method as described in claim 1, characterized in that, After identifying the inconsistencies in the paper text content using a multimodal large model for each inconsistency, and comparing the identified field content with the field content of each entered data entry to determine whether the inconsistency is a format difference or a numerical difference, the process further includes: For each inconsistency, the credibility score for each data entry is obtained by multiplying it by the corresponding weight based on the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the multimodal large model for each data entry.
5. The marine observation data entry and verification method as described in claim 4, characterized in that, After calculating the confidence score for each inconsistency by multiplying the historical data entry accuracy, continuous workload, and confidence level of the field content identified by the multimodal large model for each data entry by their respective weights and summing them, the following steps are taken: For each inconsistency, a review report is generated by combining the content of the paper text, the credibility score of each entered data, and the identification results of the multimodal large model.
6. The marine observation data entry and verification method as described in claim 5, characterized in that, After generating a discrepancy review report for each inconsistency by comprehensively considering the content of the paper text, the credibility score of each entered data, and the recognition results of the multimodal large model, the report includes: Obtain the data from the inconsistency audit report after review and judgment; The historical data entry accuracy rate of the data entry personnel is updated based on the data after review and judgment. The preset knowledge graph is updated based on the correct data after review and verification. Training samples are constructed based on the correct data after review and verification. If the number of training samples is greater than the preset number, the training samples will be used to incrementally train the multimodal large model.
7. The marine observation data entry and verification method as described in claim 1, characterized in that, The fields include core fields, key fields, and auxiliary fields, and the threshold values for the core fields, key fields, and auxiliary fields are different.
8. A marine observation data entry and verification device, characterized in that, The marine observation data entry and verification device includes: The selection module is used to select two data entries, where each data entry is obtained by different data entry personnel from the same paper text content through electronic input. The parsing module is used to parse each piece of entered data using preset data parsing rules, and to obtain each field and its corresponding content for each piece of entered data; The comparison module is used to identify inconsistencies in a field if the absolute value of the difference between the normalized values of the fields in two entered data entries is greater than the threshold corresponding to that field. The determination module is used to identify the field content of the inconsistency in the paper text content for each inconsistency using a multimodal large model, and compare the identified field content with the field content of each entered data to determine whether the inconsistency is a format difference or a numerical difference.
9. A marine observation data entry and verification device, characterized in that, The marine observation data entry and review device includes a processor, a memory, and a marine observation data entry and review program stored in the memory and executable by the processor. When the marine observation data entry and review program is executed by the processor, it implements the steps of the marine observation data entry and review method as described in any one of claims 1 to 7.
10. A readable storage medium, characterized in that, The readable storage medium stores a marine observation data entry and review program, wherein when the marine observation data entry and review program is executed by a processor, it implements the steps of the marine observation data entry and review method as described in any one of claims 1 to 7.