A highway engineering cost index generation method based on multi-source data fusion
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG YUANDA ENG CONSULTING CO LTD
- Filing Date
- 2026-05-06
- Publication Date
- 2026-06-19
Smart Images

Figure CN122243597A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of highway engineering cost data processing, and specifically to a method for generating a highway engineering cost index based on multi-source data fusion. Background Technology
[0002] The highway construction cost index reflects the changes in highway construction costs over a certain period and can provide a reference for investment estimation, budget preparation, contract price adjustment, and cost management. Existing highway construction cost indices typically rely on statistical calculations based on data such as material price information, labor unit prices, machinery shift prices, bill of quantities prices, and historical settlement prices.
[0003] In practical applications, the aforementioned data sources are scattered, and the reference prices from competent authorities, market-collected prices, contract list prices, and settlement prices are not entirely consistent in terms of collection time, applicable regions, pricing units, and corresponding project types. Furthermore, different highway projects vary significantly in route grade, topographical zoning, bridge-to-tunnel ratio, pavement structure type, and material transportation conditions, resulting in varying degrees of impact of the same cost factors on the overall cost in different engineering scenarios.
[0004] Existing methods often use fixed weights or single-source prices to calculate indices, or simply take values and average them among multiple sources. This makes it difficult to identify price conflicts of the same cost element in the same region and period, and also makes it difficult to reflect cost differences in regions with insufficient samples and special engineering structure projects. Consequently, the generated highway engineering cost index is prone to deviating from the actual cost changes under the target region, target route level, and target construction type. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this invention provides a method for generating highway engineering cost indices based on multi-source data fusion, thereby solving the technical problems existing in the prior art.
[0006] The above-mentioned technical objective of the present invention is achieved through the following technical solution: A method for generating a highway engineering cost index based on multi-source data fusion includes the following steps: S1: Obtain multi-source highway cost data and engineering attribute data for the period to be generated. Multi-source highway cost data includes material price data, labor price data, machinery shift price data, contract list price data, settlement price data, and reference price data issued by the transportation construction authority. Engineering attribute data includes project location, route grade, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius. The target index object is defined by the regional unit, route grade, construction type, and index period. S2: Map multi-source highway cost data to a preset highway cost element coding system to generate standard observation records. The standard observation records include cost element codes, price values, data sources, collection times, applicable regions, applicable project types, and project attribute identifiers. S3: Standard observation records are grouped according to cost element codes, regional units, and index cycles to form multiple price conflict groups. Each price conflict group corresponds to multiple price observation records under the same cost element code, the same regional unit, and the same index cycle. S4: For each price conflict group, generate a corresponding source confidence value based on the source stability, time fit, spatial fit, engineering fit, historical deviation, and cross consistency of the price observation record. S5: Within each price conflict group, identify abnormal observation records based on the price dispersion relationship of the price observation records within the group, remove the abnormal observation records from the price conflict group, and then fuse the price values of the remaining price observation records based on the source credibility value of the remaining price observation records to obtain the fused price of the corresponding cost element in the corresponding regional unit and the corresponding index period. S6: When the number of remaining price observation records in any price conflict group is lower than the preset minimum sample size, compensation observation records are selected from the candidate compensation sample set. The candidate compensation sample set is formed by historical observation records from adjacent areas or similar engineering structure projects. The sample borrowing reduction coefficient of the compensation observation records is determined according to spatial distance, route grade difference, bridge-tunnel ratio difference and pavement structure type difference. The compensation observation records are then used to participate in the generation of the fusion price of the price conflict group according to the sample borrowing reduction coefficient. S7: Based on the bill of quantities, settlement data, and engineering attribute data of completed highway projects, filter the set of structurally similar projects that match the target index object, and generate the engineering structure weight vector corresponding to the target index object according to the cost proportion of each cost element in the set of structurally similar projects. The engineering structure weight vector includes the engineering part weight sub-vector and the cost category weight sub-vector. The engineering part weight sub-vector includes the weight of roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facility engineering. The cost category weight sub-vector includes the weight of labor, material, and machinery. S8: Based on the combined price of each cost element in the current index period, the base period combined price of the corresponding cost element in the base period, and the engineering structure weight vector, generate the comprehensive cost index and sub-item cost index for highway engineering; among them, the engineering part weight sub-vector is used to generate the roadbed engineering cost index, pavement engineering cost index, bridge engineering cost index, tunnel engineering cost index, and traffic safety facility engineering cost index, and the cost category weight sub-vector is used to generate the labor price index, material price index, and machinery shift price index; S9: Generate the confidence level of the comprehensive cost index of highway engineering based on the number of valid samples, the number of data sources, the proportion of abnormal observation records removed, the proportion of compensation observation records involved, and the historical backtesting bias. S10: Apply the comprehensive cost index of highway engineering generated in the historical index period to the base period cost of completed highway projects to obtain the index back-introduction cost. Compare the index back-introduction cost with the actual settlement price of the completed highway project to obtain the backtesting deviation. Determine the backtesting deviation attribution result based on the maximum value among the price source deviation contribution value, engineering structure weight deviation contribution value, and sample compensation deviation contribution value. Correct the source confidence value, sample borrowing reduction coefficient, or engineering structure weight vector in the next index period based on the backtesting deviation attribution result.
[0007] Preferably, in step S1, after obtaining the multi-source highway cost data, the multi-source highway cost data is checked for entry into the database. The entry check includes field integrity check, pricing unit check, and time attribution check. Records lacking any of the following: cost element name, price value, unit of measurement, collection time, applicable region, or data source are marked as incomplete observation records. Incomplete observation records are not included in the fusion price generation of price conflict groups. Records with inconsistent pricing units are converted to a unified pricing unit under the same cost element code according to the preset unit conversion relationship.
[0008] Preferably, in S2, the highway cost element coding system is set according to the engineering part level, cost category level, and specification level; The project components are categorized into roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, traffic safety facilities engineering, landscaping and environmental protection engineering, and temporary works. The cost category hierarchy includes labor, materials, machine shifts, transportation, and measures items; The specification hierarchy is used to record the material specifications, machinery models, or construction process types corresponding to the same cost element; When mapping the multi-source highway cost data to the highway cost element coding system, the engineering part level is first determined based on the cost element name, then the cost category level is determined based on the cost category, and finally the specification level is determined based on the material specifications, machinery model, or construction procedure type.
[0009] Preferably, in step S4, the source stability is determined by the number of cycles in which the same data source forms valid observation records within three consecutive index cycles under the target cost element coding; The time fit is determined by the time interval between the collection time of the price observation record and the target index cycle; The spatial fit is determined by the consistency of administrative level and road transport distance between the applicable area and the target area unit of the price observation record; The project fit is determined by the consistency between the project corresponding to the price observation record and the target index object in terms of route grade, terrain zoning, bridge-tunnel ratio and pavement structure type. The historical deviation is determined by the deviation record between the price value from the same data source within the historical index period and the actual settlement price of the corresponding completed highway project; The degree of cross-consistency is determined by the degree of deviation between the price value recorded in the price observation and the median price within the same price conflict group; After normalizing the source stability, time fit, spatial fit, engineering fit, historical deviation, and cross-consistency, a source credibility value is generated according to a preset weight.
[0010] Preferably, in step S5, when identifying abnormal observation records, the first quartile price, the third quartile price, and the interquartile range of price observation records within the same price conflict group are first calculated, and then the first abnormal boundary and the second abnormal boundary are determined based on the first quartile price, the third quartile price, and the interquartile range. Price observation records whose price values are below the first anomaly boundary or above the second anomaly boundary are marked as candidate anomaly observation records; When the source confidence value of the candidate abnormal observation record is lower than the preset confidence threshold, the candidate abnormal observation record is determined as an abnormal observation record; When the source confidence value of the candidate anomaly observation record is not lower than the preset confidence threshold, the candidate anomaly observation record is retained, and the fusion weight corresponding to the candidate anomaly observation record is multiplied by the preset anomaly reduction coefficient.
[0011] Preferably, in step S5, when fusing the price values of the remaining price observation records, the source confidence value of each remaining price observation record is used as the first weighting factor, and a second weighting factor is determined based on the degree of standardized deviation between the price value of the remaining price observation record and the median price within the same price conflict group. The second weighting factor decreases as the degree of standardized deviation increases. The fusion weight of the remaining price observation record is determined based on the first weighting factor and the second weighting factor. The price values of the remaining price observation records within the same price conflict group are weighted according to the fusion weight to obtain the fusion price.
[0012] Preferably, in step S6, the formation of the candidate compensation sample set includes: screening historical observation records that have the same cost element codes as the target price conflict group from the historical observation records to form a preliminary compensation sample; For each preliminary compensation sample, the corresponding sample borrowing reduction factor is determined based on the spatial distance, route grade difference, bridge-tunnel ratio difference, and pavement structure type difference between the preliminary compensation sample and the target index object. The initial compensation samples with a reduction coefficient lower than the preset reduction coefficient threshold are excluded, and the remaining initial compensation samples are used as the candidate compensation sample set.
[0013] Preferably, in step S7, when screening a set of structurally similar projects, the route level, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius of the target index object are used as screening conditions. Completed highway projects whose route grade, topographic zoning, construction type, and pavement structure type are consistent with the target index object, whose bridge-to-tunnel ratio is no more than the preset bridge-to-tunnel ratio difference threshold, and whose main material transportation radius is no more than the preset transportation radius difference threshold are included in the structurally similar project set. The main material transportation radius is determined according to the road transportation distance from the main material supply location to the center point of the construction area corresponding to the target index object. When there are multiple supply locations for the same main material, the weighted transportation radius is determined according to the supply volume ratio of each supply location and the road transportation distance, and the weighted transportation radius is used as the main material transportation radius.
[0014] Preferably, in step S7, when generating the engineering structure weight vector, the proportion of engineering part cost and the proportion of cost category cost for each completed highway project in the set of structurally similar projects are calculated respectively. The cost breakdown of the engineering components includes the cost breakdown of roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facilities engineering. The cost categories and their respective percentages include the percentage of labor costs, the percentage of material costs, and the percentage of machinery costs. The median value of the same cost percentage is statistically analyzed to obtain the corresponding initial weight. The initial weights corresponding to the engineering parts and the initial weights corresponding to the cost categories are then normalized to obtain the engineering part weight sub-vector and the cost category weight sub-vector.
[0015] Preferably, in step S10, the determination of the price source deviation contribution value, the engineering structure weight deviation contribution value, and the sample compensation deviation contribution value includes: while keeping the engineering structure weight vector and the compensation observation record unchanged, determining the price source deviation contribution value based on the change in the fused price corresponding to each data source; While keeping the integrated price and compensation observation records of each cost element unchanged, the contribution value of the engineering structure weight deviation is determined by back-calculating the difference in construction value before and after the change of the engineering structure weight vector. While keeping the integrated price of each cost element and the weight vector of the engineering structure unchanged, the contribution value of the sample compensation deviation is determined by back-calculating the difference in construction value before and after the participation of compensation observation records. When the price source deviation contribution value is the largest, the source credibility value of the corresponding data source in the next index cycle is reduced. When the contribution value of the engineering structure weight deviation is the maximum, S7 is re-executed to update the engineering structure weight vector in the next exponential cycle. When the contribution value of the sample compensation bias is the largest, the sample borrowing reduction coefficient in the corresponding candidate compensation sample set is reduced, and the preset minimum sample size of the price conflict group corresponding to the cost element in the next index cycle is increased.
[0016] In summary, the present invention has the following main beneficial effects: This application maps material price data, labor price data, machinery shift price data, contract list price data, settlement price data, and reference price data issued by transportation construction authorities into a unified highway cost element coding system. Price conflict groups are formed according to cost element codes, regional units, and index periods, enabling price records from different sources, with different pricing calibers, and collected at different times to be processed under the same data caliber. Furthermore, a source reliability value is generated through source stability, time fit, spatial fit, engineering fit, historical deviation, and cross-consistency. This value is then combined with price deviation reduction factors, anomaly reduction coefficients, and sample borrowing reduction coefficients to generate a fused price. This achieves the goal of not relying on a single source price or performing simple averaging calculations when multiple source price conflicts exist, and can reduce the impact of abnormal quotations, lagging prices, and poorly matched prices on the highway engineering cost index.
[0017] By introducing regional units, route levels, construction types, and index periods into the target index object, and combining topographic zoning, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius to screen a set of structurally similar projects, the engineering structure weight vector can be matched with the target highway engineering type. By dividing the engineering structure weight vector into engineering part weight sub-vectors and cost category weight sub-vectors, which are used respectively for the comprehensive cost index under the engineering part caliber and the cost category indices such as labor, materials, and machinery shifts, the aim is to avoid double-weighting at different weight levels. Compared with index calculation methods using fixed weights or uniform weights across the entire region, this application can reflect the differences in cost structure for mountainous expressways, projects with a high proportion of bridges and tunnels, pavement reconstruction projects, and under different transportation radius conditions, making the generated highway engineering cost index more closely reflect the cost variation characteristics of specific regions and specific highway engineering types.
[0018] By introducing candidate compensation sample sets into price conflict groups with insufficient samples, and determining sample borrowing reduction coefficients based on spatial distance, route grade differences, bridge-to-tunnel ratio differences, and pavement structure type differences, this approach allows historical observation records from adjacent regions or similar engineering structures to participate in the fusion price generation within a limited scope. This achieves the goal of forming continuous index results even when samples are insufficient in county-level areas, mountainous routes, or special engineering structures. Simultaneously, this application generates index confidence levels based on the number of valid samples, the number of data sources, the proportion of abnormal observation records removed, the proportion of compensation observation records participating, and historical backtesting bias. Furthermore, by using historical indices to infer the backtesting bias between the construction cost and the actual settlement price, it attributes price source bias, engineering structure weight bias, and sample compensation bias, thereby correcting the source confidence value, sample borrowing reduction coefficient, or engineering structure weight vector in the next index period. This achieves the goal of making the highway engineering cost index generation process traceable, verifiable, and iteratively correctable. Attached Figure Description
[0019] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation
[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0021] Example 1 refer to Figure 1 A method for generating a highway engineering cost index based on multi-source data fusion includes the following steps: S1: Obtain multi-source highway cost data and engineering attribute data for the period to be generated. Multi-source highway cost data includes material price data, labor price data, machinery shift price data, contract list price data, settlement price data, and reference price data issued by the transportation construction authority. Engineering attribute data includes project location, route grade, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius. The target index object is defined by the regional unit, route grade, construction type, and index period. S2: Map multi-source highway cost data to a preset highway cost element coding system to generate standard observation records. The standard observation records include cost element codes, price values, data sources, collection times, applicable regions, applicable project types, and project attribute identifiers. S3: Standard observation records are grouped according to cost element codes, regional units, and index cycles to form multiple price conflict groups. Each price conflict group corresponds to multiple price observation records under the same cost element code, the same regional unit, and the same index cycle. S4: For each price conflict group, generate a corresponding source confidence value based on the source stability, time fit, spatial fit, engineering fit, historical deviation, and cross consistency of the price observation record. S5: Within each price conflict group, identify abnormal observation records based on the price dispersion relationship of the price observation records within the group, remove the abnormal observation records from the price conflict group, and then fuse the price values of the remaining price observation records based on the source credibility value of the remaining price observation records to obtain the fused price of the corresponding cost element in the corresponding regional unit and the corresponding index period. S6: When the number of remaining price observation records in any price conflict group is lower than the preset minimum sample size, compensation observation records are selected from the candidate compensation sample set. The candidate compensation sample set is formed by historical observation records from adjacent areas or similar engineering structure projects. The sample borrowing reduction coefficient of the compensation observation records is determined according to spatial distance, route grade difference, bridge-tunnel ratio difference and pavement structure type difference. The compensation observation records are then used to participate in the generation of the fusion price of the price conflict group according to the sample borrowing reduction coefficient. S7: Based on the bill of quantities, settlement data, and engineering attribute data of completed highway projects, filter the set of structurally similar projects that match the target index object, and generate the engineering structure weight vector corresponding to the target index object according to the cost proportion of each cost element in the set of structurally similar projects. The engineering structure weight vector includes the engineering part weight sub-vector and the cost category weight sub-vector. The engineering part weight sub-vector includes the weight of roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facility engineering. The cost category weight sub-vector includes the weight of labor, material, and machinery. S8: Based on the combined price of each cost element in the current index period, the base period combined price of the corresponding cost element in the base period, and the engineering structure weight vector, generate the comprehensive cost index and sub-item cost index for highway engineering; among them, the engineering part weight sub-vector is used to generate the roadbed engineering cost index, pavement engineering cost index, bridge engineering cost index, tunnel engineering cost index, and traffic safety facility engineering cost index, and the cost category weight sub-vector is used to generate the labor price index, material price index, and machinery shift price index; S9: Generate the confidence level of the comprehensive cost index of highway engineering based on the number of valid samples, the number of data sources, the proportion of abnormal observation records removed, the proportion of compensation observation records involved, and the historical backtesting bias. S10: Apply the comprehensive cost index of highway engineering generated in the historical index period to the base period cost of completed highway projects to obtain the index back-introduction cost. Compare the index back-introduction cost with the actual settlement price of the completed highway project to obtain the backtesting deviation. Determine the backtesting deviation attribution result based on the maximum value among the price source deviation contribution value, engineering structure weight deviation contribution value, and sample compensation deviation contribution value. Correct the source confidence value, sample borrowing reduction coefficient, or engineering structure weight vector in the next index period based on the backtesting deviation attribution result.
[0022] In this embodiment, the target index object is defined by the regional unit, route level, construction type, and index period. The regional unit can be a provincial, municipal, or county-level statistical area, or a statistical region divided according to the needs of transportation construction project cost management. The route level can be an expressway, first-class highway, second-class highway, or other highway levels. The construction type can be a type consistent with the index statistical scope, including new construction, reconstruction, expansion, or maintenance and renovation. The index period can be monthly, quarterly, or annual. After the target index object is determined, subsequent standard observation record screening, price conflict group formation, source confidence calculation, sample compensation, engineering structure weight generation, index confidence level calculation, and historical backtesting correction are all performed around this target index object.
[0023] In this embodiment, the source confidence weight coefficient, anomaly boundary coefficient, preset confidence threshold, preset anomaly reduction coefficient, preset minimum sample size, preset reduction coefficient threshold, preset bridge-to-tunnel ratio difference threshold, preset transportation radius difference threshold, deviation attenuation coefficient, confidence score weight coefficient, confidence level threshold, and preset deviation threshold are all stored in the parameter configuration table. The parameter configuration table is associated with the target index object. The initial data sources for the parameter configuration table include the bill of quantities, settlement data, historical material price data, historical labor price data, historical machinery shift price data, contract list price data, reference price data issued by the transportation construction authority, and historical backtesting results of historical completed highway projects. For target index objects with existing historical index periods, the parameter configuration table is updated based on the historical backtesting deviation attribution results; for target index objects generating an index for the first time, the parameter configuration table is determined based on samples of completed highway projects of the same route level and construction type within the same regional unit or the next higher-level regional unit.
[0024] The preset anomaly reduction coefficient is a coefficient greater than zero and less than one. Both the preset confidence threshold and the preset reduction coefficient threshold are within the range of zero to one. The preset minimum sample size is an integer not less than four, used to ensure that price conflict groups can perform quartile statistics. The preset bridge-to-tunnel ratio difference threshold and the preset transportation radius difference threshold are determined based on the historical sample distribution and backtesting deviation distribution of the structurally similar project set. The preset deviation threshold is determined based on the deviation distribution between the actual settlement price of historically completed projects and the index-inferred construction value. These parameters are not used as temporary manual judgment results, but are stored in the form of a parameter configuration table and participate in the calculation of each index cycle.
[0025] In one specific implementation, the data processing platform acquires multi-source highway cost data and engineering attribute data for the period in which the index is to be generated. The multi-source highway cost data includes material price data, labor price data, machinery shift price data, contract list price data, settlement price data, and reference price data published by the transportation construction authority.
[0026] Material price data can be derived from transportation construction project material price information, market collection records, supplier quotation records, contract price records, and material settlement records of completed projects. Material price data includes prices for cement, steel, asphalt, gravel, sand, diesel fuel, waterproofing materials, expansion joint materials, guardrail materials, and sign and marking materials. Labor price data can be derived from labor unit prices published by cost management departments, contract labor unit prices, and settlement labor unit prices. Machinery shift price data can be derived from quota shift prices, market rental prices, contract shift prices, and settlement shift prices. Contract bill of quantities price data can be derived from the winning bid bill of quantities and the contract bill of quantities. Settlement price data can be derived from settlement documents of completed highway projects. Reference price data can be derived from price information and price index data published by transportation construction authorities or their cost management agencies.
[0027] The engineering attribute data includes project location, route grade, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius. Project location is used to determine the spatial relationship between price observation records and target area units. Route grade reflects differences in highway technical standards. Topographic zoning distinguishes between plains, hills, mountains, and other engineering terrain conditions. Construction type distinguishes between new construction, reconstruction / expansion, and maintenance / renovation. Bridge-to-tunnel ratio reflects the proportion of bridge and tunnel engineering in route engineering. Pavement structure type distinguishes between asphalt concrete pavement, cement concrete pavement, and composite pavement. Main material transportation radius reflects the impact of the transportation distance between the material supplier and the construction area on the material's arrival price.
[0028] After acquiring the data, the multi-source highway cost data undergoes an input verification process. This verification includes field integrity checks, pricing unit checks, and time-based verification. Field integrity checks determine whether each data entry contains the cost element name, price value, pricing unit, collection time, applicable region, and data source. Records lacking any of these fields are marked as incomplete observation records. Incomplete observation records do not participate in the fusion price generation for price conflict groups but can be retained in the original data table for later traceability. Pricing unit checks determine whether the pricing units of records from different sources under the same cost element code are consistent. For records with inconsistent pricing units, they are converted to a unified pricing unit according to a preset unit conversion relationship. Time-based verification determines whether the collection time can be assigned to the target index period. Records that cannot be assigned to the target index period or a historical index period are not included in the current index calculation process.
[0029] For cases where a comprehensive cost index for highway engineering projects is generated for the first time and no historical index period exists, the data processing platform constructs a simulated historical index period using historical data from existing completed highway projects. Specifically, completed highway projects with the same route level and construction type as the target index project are selected, and their historical multi-source highway cost data, engineering attribute data, bill of quantities, and settlement data are replayed and calculated according to the processing flow of this embodiment to obtain a simulated historical index. Then, the simulated historical index is applied to the base period cost of the corresponding completed highway project to obtain the simulated index back-calculated cost value, which is compared with the actual settlement price to obtain the initial historical backtesting deviation. When there are insufficient qualified completed highway projects within the same regional unit, completed highway projects with the same route level and construction type in the next higher-level regional unit are used for simulated backtesting, and this difference in data source level is included in the index confidence level.
[0030] After the data is verified upon entry into the database, the multi-source highway cost data is mapped to a pre-defined highway cost element coding system. This system is set up according to the levels of project location, cost category, and specifications.
[0031] The project component hierarchy includes roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, traffic safety facilities engineering, landscaping and environmental protection engineering, and temporary works. The cost category hierarchy includes labor, materials, machinery shifts, transportation, and measures items. The specification hierarchy is used to record the material specifications, machinery models, or construction process types corresponding to the same cost element.
[0032] During mapping, the project location level is first determined based on the cost element name, then the cost category level is determined based on the cost type, and finally the specification level is determined based on material specifications, machinery models, or construction procedure types. For records with different names in the source files but essentially corresponding to the same cost element, their unified cost element code is determined through a synonym list and a verified mapping table. For example, if different sources have different names for road petroleum asphalt, as long as their specifications, uses, and pricing units can correspond to the same material specification, they are mapped to the same cost element code. For records whose specification level cannot be determined, they are first marked as records awaiting confirmation, and then added to the price conflict group after supplementing specification information or verification.
[0033] Each mapped data point forms a standard observation record. The standard observation record includes the cost element code, price value, data source, collection time, applicable region, applicable project type, and project attribute identifier. The price value is the price under a unified pricing unit. The data source identifies whether the price comes from a reference price from the competent authority, a market-collected price, a contract list price, a settlement price, a supplier's quotation, or other legitimate sources. The collection time determines the index period to which it belongs. The applicable region determines its spatial relationship with the target region unit. The applicable project type determines its matching relationship with the target index object in terms of route level, construction type, and pavement structure type. The project attribute identifier is used to associate the project location, route level, topographic zoning, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius.
[0034] Through the above processing, highway cost data from different sources, with different naming methods, and different pricing units are transformed into standard observation records that can participate in price conflict fusion and index generation. This processing is not simply storing cost documents, but rather providing a unified data foundation for subsequent source credibility calculations, sample compensation, and engineering structure weight generation.
[0035] After generating standard observation records, these records are grouped according to cost element codes, regional units, and index periods, forming multiple price conflict groups. Each price conflict group corresponds to multiple price observation records under the same cost element code, the same regional unit, and the same index period.
[0036] A single price conflict group can simultaneously include reference prices issued by the competent authority, market-collected prices, contract list prices, settlement prices, and supplier quotations. These price sources have different formation mechanisms. Reference prices issued by the competent authority have management reference attributes but may be subject to publication lag; market-collected prices can reflect real-time market fluctuations but may be affected by the collection channels and supply scope; contract list prices are influenced by bidding strategies, contract terms, and project scale; settlement prices are close to the actual outcome of the project but are formed later than the transaction date; supplier quotations can reflect supply-side pricing but may include commercial price fluctuations.
[0037] Therefore, when multiple price values for the same cost element appear in the same region and within the same period, the price is not directly identified as the unique valid price, nor is an arithmetic average directly applied. Instead, a price conflict group is first formed, and then processed using source confidence values, price dispersion relationships, sample borrowing reduction factors, and outlier reduction factors. Limiting the processing scope by using price conflict groups can avoid index distortion caused by mixing data from different regions, periods, or specifications.
[0038] For each price conflict group, a corresponding source confidence value is generated based on the source stability, time fit, spatial fit, engineering fit, historical deviation, and cross-consistency of the price observation record.
[0039] Source stability is determined by the number of periods in which valid observation records are generated within three consecutive index periods under the target cost element coding from the same data source. Temporal fit is determined by the time interval between the collection time of the price observation records and the target index period. Spatial fit is determined by the consistency of administrative level and road transport distance between the applicable region of the price observation records and the target region unit. Engineering fit is determined by the consistency between the project corresponding to the price observation records and the target index object in terms of route grade, topographic zoning, bridge-to-tunnel ratio, and pavement structure type. Historical deviation is determined by the deviation between the price value from the same data source within the historical index period and the actual settlement price of the corresponding completed highway project. Cross-consistency is determined by the degree of deviation between the price value of the price observation records and the median price within the same price conflict group.
[0040] This embodiment normalizes the source stability, temporal fit, spatial fit, engineering fit, historical deviation, and cross-consistency. For indicators where larger values indicate greater reliability or better matching, forward normalization is used; for indicators where larger values indicate greater deviation or higher risk, reverse normalization is used. The upper and lower limits used for normalization are derived from the parameter configuration table and determined based on the distribution range of historical observation records, settlement data of historically completed highway projects, and backtesting results. When an indicator exceeds the upper limit in the parameter configuration table, it is processed according to the upper limit; when an indicator is below the lower limit in the parameter configuration table, it is processed according to the lower limit.
[0041] For categorized data in engineering attributes, a difference identifier is used for normalization. When route levels are consistent, the normalized value for route level difference is zero; when route levels are inconsistent, the normalized value for route level difference is determined according to the preset level difference table. When pavement structure types are consistent, the normalized value for pavement structure type difference is zero; when pavement structure types are inconsistent, the normalized value for pavement structure type difference is determined according to the preset structure type difference table. The preset level difference table and the preset structure type difference table are jointly determined by the highway engineering cost management rules and the backtesting results of historical projects, and are stored in the parameter configuration table.
[0042] In one specific implementation, the source credibility value is generated according to the following formula: ; in, Indicates the first The source credibility value of each price observation record; Indicates the first The source stability normalized value corresponding to each price observation record; Indicates the first Normalized value of time fit corresponding to each price observation record; Indicates the first The spatial fit normalized value corresponding to each price observation record; Indicates the first Normalized value of project fit corresponding to each price observation record; Indicates the first Normalized value of historical deviation corresponding to each price observation record; Indicates the first Cross-consistency normalized values corresponding to each price observation record; Indicates the first Each normalized value corresponds to a preset weight coefficient. It is a non-negative number, and Greater than zero.
[0043] When normalizing historical deviation, the smaller the historical deviation, the larger the corresponding normalized value. When normalizing cross-consistency, the smaller the deviation from the median price within the same price conflict group, the larger the corresponding normalized value. After completing the historical backtesting deviation attribution, if the attribution result points to price source deviation, the corresponding historical deviation normalized value or source credibility correction factor of the data source will be adjusted in the next index cycle.
[0044] After a price conflict group is formed, it is first determined whether the number of price observation records within the group has reached the preset minimum sample size. If the number of price observation records reaches the preset minimum sample size, anomaly observation identification is performed. If the number of price observation records is lower than the preset minimum sample size, the price conflict group is marked as an insufficient sample price conflict group, and compensation observation record selection is performed.
[0045] After the compensation observation records are included, it is then determined whether the number of price observation records participating in the processing within the price conflict group reaches the preset minimum sample size. If the preset minimum sample size is reached, anomaly detection is performed; if the preset minimum sample size is still not reached, anomaly detection based on interquartile range is not performed, and instead, the fusion weight is determined based on the source confidence value, the sample borrowing reduction factor, and the degree of deviation between the price observation record and the median price within the group. The participation ratio of the compensation observation records corresponding to the price conflict group is recorded and included in the calculation of the index confidence level.
[0046] When performing anomaly detection, the first quartile, third quartile, and interquartile range of price values within the price conflict group are first calculated. Then, the first and second anomaly boundaries are determined based on the interquartile range and the anomaly boundary coefficients in the parameter configuration table. Price observation records with price values lower than the first anomaly boundary or higher than the second anomaly boundary are marked as candidate anomaly observation records. The anomaly boundary coefficients are determined based on the historical price fluctuation distribution of the corresponding cost element, price review rules, and historical backtesting results.
[0047] When the source credibility value of a candidate anomaly observation record is lower than a preset credibility threshold, the candidate anomaly observation record is identified as an anomaly observation record and removed from the price conflict group. When the source credibility value of a candidate anomaly observation record is not lower than the preset credibility threshold, the candidate anomaly observation record is retained, and the fusion weight corresponding to the candidate anomaly observation record is multiplied by a preset anomaly reduction coefficient. The preset anomaly reduction coefficient is stored in a parameter configuration table and is determined based on the review results and backtesting results of historical anomaly observation records.
[0048] After anomaly processing, the price values of the remaining price observation records are fused. The fusion weight of each remaining price observation record is determined by the source confidence value, price deviation reduction factor, anomaly reduction coefficient, and sample borrowing reduction coefficient. The price deviation reduction factor is determined based on the standardized deviation between the price observation record and the median price of the price conflict group; the greater the standardized deviation, the smaller the price deviation reduction factor. When the interquartile range is zero, the minimum non-zero price difference within the price conflict group or a preset small positive number from the parameter configuration table is used as the standardization benchmark.
[0049] No. The fusion weights of the remaining price observation records are determined according to the following formula: ; in, Indicates the first The fusion weight of each remaining price observation record; Indicates the first The source confidence value of each remaining price observation record; Indicates the first Price deviation reduction factor for each remaining price observation record; Indicates the first The abnormal reduction factor of the remaining price observation records, and the corresponding price observation records that were not marked as candidate abnormal observation records. Take one, the candidate anomaly observation record that is retained Take the preset abnormal reduction coefficient; Indicates the first The sample borrowing reduction factor for each remaining price observation record, and the corresponding non-compensated observation records Take one, and compensate for the corresponding observation record. Determined according to the sample borrowing and reduction rules; This represents the set of price observation records that have been removed from abnormal observation records and are included in the fusion process. express Any price observation record in the data.
[0050] The fusion price corresponding to this price conflict group is determined according to the following formula: ; in, Indicates price conflict group The combined price of the corresponding cost elements within the corresponding regional unit and the corresponding index period; Indicates the first The fusion weight of each remaining price observation record; Indicates the first The price value of each remaining price observation record.
[0051] In this way, the merged price is not a simple average price, nor is it a fixed price from a certain source. Instead, it is formed by combining the source credibility, price dispersion, anomaly reduction, and sample borrowing reduction within the same price conflict group.
[0052] When the number of remaining price observation records within any price conflict group is lower than the preset minimum sample size, compensation observation records are selected from the candidate compensation sample set. The preset minimum sample size can be determined based on the importance of the cost element, price volatility, and historical backtesting error requirements, and is stored in the parameter configuration table. For important materials and cost elements with large price fluctuations, the preset minimum sample size can be higher than that for general cost elements.
[0053] The formation of the candidate compensation sample set includes the following steps. First, historical observation records with the same cost element codes as the target price conflict group are selected from historical observation records to form preliminary compensation samples. Second, for each preliminary compensation sample, the corresponding sample borrowing reduction coefficient is determined based on the spatial distance, route grade difference, bridge-to-tunnel ratio difference, and pavement structure type difference between the preliminary compensation sample and the target index object. Finally, preliminary compensation samples with sample borrowing reduction coefficients lower than the preset reduction coefficient threshold are excluded, and the remaining preliminary compensation samples are used as the candidate compensation sample set.
[0054] Spatial distance differences are determined based on the road transport distance between the applicable area of the initial compensation sample and the target area unit. Route grade differences are determined based on a preset grade difference table. Bridge-tunnel ratio differences are determined based on the difference in bridge-tunnel ratio between the corresponding project of the initial compensation sample and the target index object. Pavement structure type differences are determined based on a preset structure type difference table. The greater the above differences, the smaller the sample borrowing reduction factor; the smaller the sample borrowing reduction factor, the lower the impact of the compensation observation records on the generation of the fusion price.
[0055] When compensated observation records are used in the generation of fused prices, their fusion weight is further multiplied by a sample borrowing reduction factor on top of the source confidence value and the degree of price deviation. This avoids index missing due to insufficient samples within the target region unit, and also avoids directly equating data from adjacent regions or similar engineering structures with data from the target region.
[0056] After generating the fused price, based on the bill of quantities, settlement data and engineering attribute data of the completed highway projects, a set of structurally similar projects matching the target index object is selected, and an engineering structure weight vector is generated according to the cost proportion of each cost element in the set of structurally similar projects.
[0057] When selecting a set of structurally similar projects, the following criteria are used: route grade, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius of the target index object. Completed highway projects whose route grade, topographic zoning, construction type, and pavement structure type are consistent with the target index object, whose bridge-to-tunnel ratio differs from that of the target index object by no more than a preset threshold, and whose main material transportation radius differs from that of the target index object by no more than a preset threshold, are included in the set of structurally similar projects.
[0058] When the number of completed highway projects in the set of structurally similar projects is lower than the preset weight sample size, the system expands the screening scope according to a preset relaxation order. The preset relaxation order is as follows: maintaining consistency in route level and construction type, first relaxing the threshold for the difference in the main material transportation radius, then relaxing the threshold for the difference in the bridge-to-tunnel ratio, and finally relaxing the topographic zoning conditions. After each relaxation of the screening conditions, the set of structurally similar projects is re-screened. If the preset weight sample size is still not reached after relaxation, the engineering structure weight vector is generated using the already screened set of structurally similar projects, and the insufficient number of structurally similar projects and the number of relaxations are included in the index confidence level. The route level and construction type remain unchanged during the relaxation process to ensure that the engineering structure weight vector remains basically consistent with the target index object.
[0059] The transportation radius of major materials is determined based on the road transportation distance from the major material supply location to the center point of the construction area corresponding to the target index object. When there are multiple supply locations for the same major material, the weighted transportation radius is determined based on the supply volume ratio of each supply location and the road transportation distance; when the supply volume ratio cannot be obtained, the ratio of each supply location is determined based on the contract supply volume, settlement purchase volume, or material arrival ledger.
[0060] The engineering structure weight vector includes a weight sub-vector for engineering components and a weight sub-vector for cost categories. The weight sub-vector for engineering components includes the weights for roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facilities engineering. The weight sub-vector for cost categories includes the weights for labor, materials, and machinery.
[0061] When generating the engineering structure weight vector, the cost percentages for each completed highway project in the set of structurally similar projects are calculated separately for both the engineering component cost percentage and the cost percentage for each cost category. The engineering component cost percentage includes the cost percentages for subgrade engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facility engineering. The cost category cost percentages include the cost percentages for labor, materials, and machinery.
[0062] Median values were calculated for the cost proportions of the same project component, and the results were normalized to obtain a weighted subvector for the project component. Similarly, median values were calculated for the cost proportions of the same cost category, and the results were normalized to obtain a weighted subvector for the cost category. Using median values can reduce the impact of abnormal cost structures in individual projects on weight generation.
[0063] The engineering component weight sub-vector is used to generate the cost index for subgrade engineering, pavement engineering, bridge engineering, tunnel engineering, traffic safety facilities engineering, and the comprehensive highway engineering cost index under the engineering component category. The cost category weight sub-vector is used to generate the labor price index, material price index, and machinery shift price index, and participates in the index confidence level and backtesting bias attribution analysis. The engineering component weight sub-vector and the cost category weight sub-vector are not superimposed at the same level, thereby avoiding double-weighting between the weights of labor, materials, and machinery and the weights of subgrade, pavement, bridge, and tunnel.
[0064] For each cost factor, a factor price index is generated based on the merged price of that cost factor in the current index period and the base period merged price of the corresponding cost factor in the base period: ; in, Indicates the first Factor price index of each cost element; Indicates the first The combined price of each cost element within the current index cycle; Indicates the first The base period merged price for each cost element within the base period. The base period merged price is generated using the same data mapping, price conflict group division, source credibility, and merge rules as the current index period to ensure consistency in calculation methods between the current index period and the base period.
[0065] For a specific part of a project, a project cost index is generated based on the price index of the cost elements belonging to that part and the cost percentage of the corresponding cost elements within that part. The cost percentage of each cost element within a part is determined based on the cost percentage of each cost element within the corresponding part of the set of structurally similar projects, and is then normalized within the same part.
[0066] The comprehensive cost index for highway engineering is generated according to the weighted sub-vectors of engineering components: ; in, This represents the comprehensive cost index for highway engineering projects; Indicates the first The weight of each engineering component; Indicates the first The project cost index for each part of the project; Indicates the number of parts of the project.
[0067] The material price index, labor price index, and machinery shift price index are generated according to the cost category weight sub-vector or the proportion of cost elements within the cost category. Specifically, within the same cost category, the material price index, labor price index, or machinery shift price index is generated based on the cost element price index belonging to that cost category and the corresponding cost element's cost proportion.
[0068] This embodiment can output the comprehensive cost index for highway engineering, the cost index for roadbed engineering, the cost index for pavement engineering, the cost index for bridge engineering, the cost index for tunnel engineering, the cost index for traffic safety facilities engineering, the material price index, the labor price index, and the machinery shift price index. All of these indices retain their corresponding target index objects, index periods, number of data sources, number of valid samples, proportion of abnormal observation records removed, and proportion of compensation observation records involved, for traceability purposes.
[0069] After generating the comprehensive cost index for highway engineering, the confidence level of the comprehensive cost index for highway engineering is generated based on the number of valid samples, the number of data sources, the proportion of abnormal observation records removed, the proportion of compensation observation records involved, and the historical backtesting bias.
[0070] The effective sample size represents the scale of effective observation records participating in the generation of fused prices under the target index object. The number of data sources represents the diversity of data sources participating in the index calculation. The outlier observation record removal ratio represents the proportion of observation records judged as outliers and removed. The compensation observation record participation ratio represents the degree to which samples from adjacent regions or similar engineering structure projects are borrowed when samples are insufficient. The historical backtesting bias represents the degree of deviation of the index generated in historical index periods from the interpretation of the settlement price of completed projects. The situation of insufficient structurally similar projects, the number of times the screening conditions for structurally similar projects are relaxed, and the situation of using samples from the next higher level regional unit when generating the index for the first time are also included in the index confidence level.
[0071] The index confidence level is determined based on the normalized results of the above indicators. A higher number of valid samples and data sources results in a higher index confidence level; conversely, a higher proportion of abnormal observation records removed, a higher proportion of compensating observation records included, and a higher historical backtesting bias results in a lower index confidence level. The confidence score weighting coefficients and confidence level thresholds are derived from the parameter configuration table and determined based on historical index release results, backtesting error distribution, and cost management requirements.
[0072] Based on the relationship between the confidence score and the preset confidence level threshold, the index confidence level is divided into high confidence level, medium confidence level, and low confidence level. For index results with low confidence level, the system retains the index calculation result and marks the low confidence level in the output result, while triggering the data supplementation, sample compensation rule adjustment, or engineering structure weight reconstruction process for the next index cycle.
[0073] To ensure the correctability of the highway engineering cost index generation process, this embodiment applies the comprehensive highway engineering cost index generated in the historical index cycle to the base period cost of completed highway projects to obtain the index-backward construction value. Then, the index-backward construction value is compared with the actual settlement price of the completed highway project to obtain the backtesting deviation.
[0074] For the For each completed highway project, its inverse value is determined using the following formula: ; in, Indicates the first The index of completed highway projects can be used to infer value creation; Indicates the first Base period construction cost of a completed highway project; Indicates historical index cycle The generated comprehensive cost index for highway engineering. The backtesting deviation is determined according to the following formula: ; in, Indicates the first Backtesting deviations of completed highway projects; This indicates that the index is used to infer value. Indicates the first The actual settlement price of a completed highway project.
[0075] When the backtesting deviation exceeds the preset deviation threshold, the contribution values of price source deviation, engineering structure weight deviation, and sample compensation deviation are further determined. The contribution values of the three types of deviations are determined using a controlled variable approach.
[0076] The price source bias contribution value is used to reflect the impact of different data sources on the fused price bias. While keeping the engineering structure weight vector and compensation observation records unchanged, the price observation records corresponding to one data source are removed, and the fused price and index-inverted construction value are regenerated. The difference between the index-inverted construction value before and after the removal is used to characterize the price source bias contribution of that data source.
[0077] The engineering structure weight deviation contribution value is used to reflect the impact of the engineering structure weight vector on the backtesting deviation. While keeping the integrated price and compensation observation records of each cost element unchanged, the engineering structure weight vector used in the historical index cycle is replaced with a newly generated engineering structure weight vector based on newly completed projects. The index backtesting construction value is recalculated, and the difference between the index backtesting construction value before and after the replacement is used to characterize the engineering structure weight deviation contribution.
[0078] The sample compensation bias contribution value is used to reflect the impact of compensation observation records on backtesting bias. While maintaining the fusion price generation rules and engineering structure weight vectors of each cost element unchanged, compensation observation records are removed or their sample borrowing reduction coefficients are adjusted to a state where they do not participate in fusion. The index-based backtesting construction value is then recalculated, and the difference between the index-based backtesting construction value before and after adjustment characterizes the sample compensation bias contribution.
[0079] Compare the contribution values of price source deviation, engineering structure weight deviation, and sample compensation deviation. When the price source deviation contribution value is the largest, the backtesting deviation is determined to be mainly attributed to the price source deviation, and the source of the deviation data is further identified. The deviation data source is the data source that produces the largest index-based back-calculated construction value difference when calculating the price source deviation contribution value. If two or more data sources produce the same largest index-based back-calculated construction value difference, the proportion of that data source in the abnormal observation records and the normalized value of historical deviation are calculated respectively. The data source with a higher proportion of abnormal observation records and a lower normalized value of historical deviation is identified as the deviation data source. After identifying the deviation data source, the source confidence correction factor of that deviation data source in the next index period is reduced, or the normalized value of the historical deviation corresponding to that deviation data source is reduced.
[0080] When the contribution value of the engineering structure weight deviation is the largest, it is determined that the backtesting deviation is mainly attributed to the engineering structure weight deviation, and the steps of screening the set of structurally similar projects and generating the engineering structure weight vector are repeated to update the engineering structure weight vector in the next index period. When the contribution value of the sample compensation deviation is the largest, it is determined that the backtesting deviation is mainly attributed to the sample compensation deviation, and the sample borrowing reduction factor in the corresponding candidate compensation sample set is reduced, while the preset minimum sample size of the price conflict group corresponding to this cost element in the next index period is increased.
[0081] Through the aforementioned backtesting bias attribution and parameter correction, the source confidence value, sample borrowing reduction factor, or engineering structure weight vector for the next index period are updated based on historical verification results. This approach creates a closed loop in the index generation method, distinguishing it from schemes that only match similar projects in the historical engineering database and output a single project cost prediction value, as well as schemes that calculate material price indices or engineering cost indices based on fixed weights.
[0082] Example 2 This embodiment uses the generation of a quarterly highway construction cost index for a new expressway project in a certain region as an example to illustrate the method of this application. First, the target index objects are determined as the regional unit, the expressway, the new construction project, and the current quarter. The data processing platform collects the relevant government department's reference price, market collection price, contract list price, settlement price, labor unit price, machinery shift price, and major material prices for the current quarter. Simultaneously, it obtains the bill of quantities, settlement data, bridge-to-tunnel ratio, pavement structure type, and major material transportation radius for completed new expressway projects.
[0083] The system performs field integrity checks, pricing unit checks, and time attribution checks on the collected data. Records lacking collection time or applicable regions are marked as incomplete and excluded from price generation. Unit conversions are performed for material prices from different units. Subsequently, the system maps data from various sources to standard observation records according to the highway cost element coding system. For example, asphalt material prices, steel prices, diesel prices, paver shift prices, and tunnel excavator shift prices are mapped to different cost element codes.
[0084] The system forms price conflict groups based on cost element codes, regional units, and the current quarter. For each price conflict group, the system calculates source stability, time fit, spatial fit, project fit, historical deviation, and cross-consistency, and generates a source credibility value. For records where the price is significantly higher than the second anomaly boundary within the group and the source credibility value is lower than a preset credibility threshold, the system identifies them as anomaly observations and removes them. For records where the price is higher than the second anomaly boundary but the source credibility value is not lower than the preset credibility threshold, the system retains the record and applies anomaly reduction coefficients to its fusion weight. Subsequently, the system determines the fusion weights based on the source credibility value, price deviation reduction factor, anomaly reduction coefficient, and sample borrowing reduction coefficient, and generates the fused price for each cost element in the current quarter.
[0085] For the price of some tunnel engineering machinery shifts, due to insufficient effective observation records within the target area unit in the current quarter, the system filters records with the same cost element codes from historical observation records and determines the sample borrowing reduction factor based on spatial distance, route grade differences, bridge-to-tunnel ratio differences, and pavement structure type differences. Historical observation records with a reduction factor lower than the preset reduction factor threshold are excluded, and the remaining historical observation records are used as compensation observation records to participate in the fusion price generation. If the preset minimum sample size is still not reached after compensation, anomaly identification based on interquartile range is not performed. Instead, a fusion weight is generated based on the source confidence value, sample borrowing reduction factor, and price deviation degree, and the price conflict group is recorded in the index confidence level calculation.
[0086] During the weight generation phase, the system selects projects from completed new highway projects that share the same route grade, topographic zoning, construction type, and pavement structure type as the target index object, and whose differences in bridge-to-tunnel ratio and main material transportation radius meet the threshold requirements, forming a set of structurally similar projects. When the number of completed projects in the set of structurally similar projects is lower than the preset weight sample size, the system, adhering to the principle of maintaining consistency in route grade and construction type, sequentially relaxes the thresholds for main material transportation radius difference, bridge-to-tunnel ratio difference, and topographic zoning conditions, and includes the relaxation frequency in the index confidence level.
[0087] The system generates a weighted sub-vector of engineering parts based on the proportion of roadbed engineering costs, pavement engineering costs, bridge engineering costs, tunnel engineering costs, and traffic safety facility engineering costs for each project in the set of structurally similar projects; and generates a weighted sub-vector of cost categories based on the proportion of labor costs, material costs, and machinery costs.
[0088] Subsequently, the system generates a comprehensive cost index for highway engineering and sub-cost indices for roadbed, pavement, bridges, tunnels, and traffic safety facilities based on the current quarterly fusion price, the base period fusion price, and the weighted sub-vectors of engineering parts. It also generates labor price indices, material price indices, and machinery shift price indices based on the weighted sub-vectors of cost categories. The system further generates index confidence levels based on the number of valid samples, the number of data sources, the proportion of abnormal observation records removed, the proportion of compensated observation records, historical backtesting bias, the insufficiency of structurally similar items, and the number of relaxations.
[0089] Before generating the index for the next quarter, the system backtests the historical quarterly index using the settlement prices of completed projects. If the backtesting deviation between the inferred construction value and the actual settlement price of a historical quarterly index exceeds a preset deviation threshold, the system calculates the contribution values of price source deviation, engineering structure weight deviation, and sample compensation deviation. If the price source deviation contribution value is the largest, the data source that generates the largest difference in the inferred construction value is identified as the deviation data source, and the source credibility value of this deviation data source in the next quarter is reduced. If the engineering structure weight deviation contribution value is the largest, the set of structurally similar projects is re-screened and the engineering structure weight vector is updated. If the sample compensation deviation contribution value is the largest, the sample borrowing reduction coefficient in the corresponding candidate compensation sample set is reduced, and the preset minimum sample size for the price conflict group corresponding to this cost element is increased.
[0090] In this embodiment, the multi-source highway cost data is not simply aggregated; price conflict groups are used to limit the data fusion scope under the same cost element, the same region, and the same period; source credibility values are used to measure the reliability of each price observation record participating in the fusion; sample borrowing reduction coefficients are used to control the participation weight of samples from adjacent regions or similar engineering structure projects; engineering structure weight vectors are used to reflect the highway engineering structure characteristics of the target index object; backtesting bias attribution is used to correct the parameters for the next index period based on historical verification results. Therefore, this application forms a dedicated data processing flow for the highway engineering cost index generation scenario, ensuring that all technical features in the claims can be explicitly implemented in this embodiment.
[0091] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A method for generating a highway engineering cost index based on multi-source data fusion, characterized in that, Includes the following steps: S1: Obtain multi-source highway cost data and engineering attribute data for the period to be generated. Multi-source highway cost data includes material price data, labor price data, machinery shift price data, contract list price data, settlement price data, and reference price data issued by the transportation construction authority. Engineering attribute data includes project location, route grade, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius. The target index object is defined by the regional unit, route grade, construction type, and index period. S2: Map multi-source highway cost data to a preset highway cost element coding system to generate standard observation records. The standard observation records include cost element codes, price values, data sources, collection times, applicable regions, applicable project types, and project attribute identifiers. S3: Standard observation records are grouped according to cost element codes, regional units, and index cycles to form multiple price conflict groups. Each price conflict group corresponds to multiple price observation records under the same cost element code, the same regional unit, and the same index cycle. S4: For each price conflict group, generate a corresponding source confidence value based on the source stability, time fit, spatial fit, engineering fit, historical deviation, and cross consistency of the price observation record. S5: Within each price conflict group, identify abnormal observation records based on the price dispersion relationship of the price observation records within the group, remove the abnormal observation records from the price conflict group, and then fuse the price values of the remaining price observation records based on the source credibility value of the remaining price observation records to obtain the fused price of the corresponding cost element in the corresponding regional unit and the corresponding index period. S6: When the number of remaining price observation records in any price conflict group is lower than the preset minimum sample size, compensation observation records are selected from the candidate compensation sample set. The candidate compensation sample set is formed by historical observation records from adjacent areas or similar engineering structure projects. The sample borrowing reduction coefficient of the compensation observation records is determined according to spatial distance, route grade difference, bridge-tunnel ratio difference and pavement structure type difference. The compensation observation records are then used to participate in the generation of the fusion price of the price conflict group according to the sample borrowing reduction coefficient. S7: Based on the bill of quantities, settlement data, and engineering attribute data of completed highway projects, filter the set of structurally similar projects that match the target index object, and generate the engineering structure weight vector corresponding to the target index object according to the cost proportion of each cost element in the set of structurally similar projects. The engineering structure weight vector includes the engineering part weight sub-vector and the cost category weight sub-vector. The engineering part weight sub-vector includes the weight of roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facility engineering. The cost category weight sub-vector includes the weight of labor, material, and machinery. S8: Based on the combined price of each cost element in the current index period, the base period combined price of the corresponding cost element in the base period, and the engineering structure weight vector, generate the comprehensive cost index and sub-item cost index for highway engineering; among them, the engineering part weight sub-vector is used to generate the roadbed engineering cost index, pavement engineering cost index, bridge engineering cost index, tunnel engineering cost index, and traffic safety facility engineering cost index, and the cost category weight sub-vector is used to generate the labor price index, material price index, and machinery shift price index; S9: Generate the confidence level of the comprehensive cost index of highway engineering based on the number of valid samples, the number of data sources, the proportion of abnormal observation records removed, the proportion of compensation observation records involved, and the historical backtesting bias. S10: Apply the comprehensive cost index of highway engineering generated in the historical index period to the base period cost of completed highway projects to obtain the index back-introduction cost. Compare the index back-introduction cost with the actual settlement price of the completed highway project to obtain the backtesting deviation. Determine the backtesting deviation attribution result based on the maximum value among the price source deviation contribution value, engineering structure weight deviation contribution value, and sample compensation deviation contribution value. Correct the source confidence value, sample borrowing reduction coefficient, or engineering structure weight vector in the next index period based on the backtesting deviation attribution result.
2. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 1, characterized in that, In step S1, after obtaining the multi-source highway cost data, the multi-source highway cost data is checked for entry into the database. The entry check includes field integrity check, pricing unit check, and time attribution check. Records lacking any of the following: cost element name, price value, unit of measurement, collection time, applicable region, or data source are marked as incomplete observation records. Incomplete observation records are not included in the fusion price generation of price conflict groups. Records with inconsistent pricing units are converted to a unified pricing unit under the same cost element code according to the preset unit conversion relationship.
3. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 2, characterized in that, In S2, the highway cost element coding system is set according to the engineering part level, cost category level, and specification level; The project components are categorized into roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, traffic safety facilities engineering, landscaping and environmental protection engineering, and temporary works. The cost category hierarchy includes labor, materials, machine shifts, transportation, and measures items; The specification hierarchy is used to record the material specifications, machinery models, or construction process types corresponding to the same cost element; When mapping the multi-source highway cost data to the highway cost element coding system, the engineering part level is first determined based on the cost element name, then the cost category level is determined based on the cost category, and finally the specification level is determined based on the material specifications, machinery model, or construction procedure type.
4. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 3, characterized in that, In S4, the source stability is determined by the number of cycles in which the same data source forms valid observation records within three consecutive index cycles under the target cost element code; The time fit is determined by the time interval between the collection time of the price observation record and the target index cycle; The spatial fit is determined by the consistency of administrative level and road transport distance between the applicable area and the target area unit of the price observation record; The project fit is determined by the consistency between the project corresponding to the price observation record and the target index object in terms of route grade, terrain zoning, bridge-tunnel ratio and pavement structure type. The historical deviation is determined by the deviation record between the price value from the same data source within the historical index period and the actual settlement price of the corresponding completed highway project; The degree of cross-consistency is determined by the degree of deviation between the price value recorded in the price observation and the median price within the same price conflict group; After normalizing the source stability, time fit, spatial fit, engineering fit, historical deviation, and cross-consistency, a source credibility value is generated according to a preset weight.
5. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 4, characterized in that, In S5, when identifying abnormal observation records, the first quartile price, the third quartile price, and the interquartile range of price observation records within the same price conflict group are first calculated, and then the first abnormal boundary and the second abnormal boundary are determined based on the first quartile price, the third quartile price, and the interquartile range. Price observation records whose price values are below the first anomaly boundary or above the second anomaly boundary are marked as candidate anomaly observation records; When the source confidence value of the candidate abnormal observation record is lower than the preset confidence threshold, the candidate abnormal observation record is determined as an abnormal observation record; When the source confidence value of the candidate anomaly observation record is not lower than the preset confidence threshold, the candidate anomaly observation record is retained, and the fusion weight corresponding to the candidate anomaly observation record is multiplied by the preset anomaly reduction coefficient.
6. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 5, characterized in that, In S5, when fusing the price values of the remaining price observation records, the source confidence value of each remaining price observation record is used as the first weighting factor, and the second weighting factor is determined according to the degree of standardized deviation between the price value of the remaining price observation record and the median price in the same price conflict group. The second weighting factor decreases as the degree of standardized deviation increases. The fusion weight of the remaining price observation record is determined based on the first weighting factor and the second weighting factor. The price values of the remaining price observation records within the same price conflict group are weighted according to the fusion weight to obtain the fusion price.
7. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 6, characterized in that, In S6, the formation of the candidate compensation sample set includes: screening historical observation records that have the same cost element codes as the target price conflict group from the historical observation records to form a preliminary compensation sample; For each preliminary compensation sample, the corresponding sample borrowing reduction factor is determined based on the spatial distance, route grade difference, bridge-tunnel ratio difference, and pavement structure type difference between the preliminary compensation sample and the target index object. The initial compensation samples with a reduction coefficient lower than the preset reduction coefficient threshold are excluded, and the remaining initial compensation samples are used as the candidate compensation sample set.
8. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 7, characterized in that, In S7, when screening a set of structurally similar projects, the route level, topographic zoning, construction type, bridge-to-tunnel ratio, pavement structure type, and main material transportation radius of the target index object are used as screening conditions. Completed highway projects whose route grade, topographic zoning, construction type, and pavement structure type are consistent with the target index object, whose bridge-to-tunnel ratio is no more than the preset bridge-to-tunnel ratio difference threshold, and whose main material transportation radius is no more than the preset transportation radius difference threshold are included in the structurally similar project set. The main material transportation radius is determined according to the road transportation distance from the main material supply location to the center point of the construction area corresponding to the target index object. When there are multiple supply locations for the same main material, the weighted transportation radius is determined according to the supply volume ratio of each supply location and the road transportation distance, and the weighted transportation radius is used as the main material transportation radius.
9. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 8, characterized in that, In S7, when generating the engineering structure weight vector, the proportion of engineering part cost and the proportion of cost category cost for each completed highway project in the set of structurally similar projects are calculated respectively. The cost breakdown of the engineering components includes the cost breakdown of roadbed engineering, pavement engineering, bridge engineering, tunnel engineering, and traffic safety facilities engineering. The cost categories and their respective percentages include the percentage of labor costs, the percentage of material costs, and the percentage of machinery costs. The median value of the same cost percentage is statistically analyzed to obtain the corresponding initial weight. The initial weights corresponding to the engineering parts and the initial weights corresponding to the cost categories are then normalized to obtain the engineering part weight sub-vector and the cost category weight sub-vector.
10. The method for generating a highway engineering cost index based on multi-source data fusion according to claim 9, characterized in that, In S10, the determination of the price source deviation contribution value, the engineering structure weight deviation contribution value, and the sample compensation deviation contribution value includes: under the condition of keeping the engineering structure weight vector and the compensation observation record unchanged, determining the price source deviation contribution value according to the change in the fused price corresponding to each data source; While keeping the integrated price and compensation observation records of each cost element unchanged, the contribution value of the engineering structure weight deviation is determined by back-calculating the difference in construction value before and after the change of the engineering structure weight vector. While keeping the integrated price of each cost element and the weight vector of the engineering structure unchanged, the contribution value of the sample compensation deviation is determined by back-calculating the difference in construction value before and after the participation of compensation observation records. When the price source deviation contribution value is the largest, the source credibility value of the corresponding data source in the next index cycle is reduced. When the contribution value of the engineering structure weight deviation is the maximum, S7 is re-executed to update the engineering structure weight vector in the next exponential cycle. When the contribution value of the sample compensation bias is the largest, the sample borrowing reduction coefficient in the corresponding candidate compensation sample set is reduced, and the preset minimum sample size of the price conflict group corresponding to the cost element in the next index cycle is increased.