Predictive Forecasting
The Bruviti AIP addresses the long-tail parts forecasting problem through a three-agent architecture that combines ontology-based peer selection, Weibull survival analysis, and LLM validation — enabling statistically grounded predictions even for parts with sparse or zero transaction history.
The Long-Tail Forecasting Problem
In parts forecasting, approximately 20% of parts have enough transaction history for traditional statistical methods to produce reliable predictions. The remaining 80% — the "long tail" — have sparse or zero history. These are new-to-catalog parts, recently superseded items, parts for newly deployed equipment, or low-demand components that see only a few transactions per year.
Traditional forecasting approaches handle the long tail poorly:
- Ignore — parts with insufficient data are excluded from forecasting entirely, creating blind spots in inventory planning
- Category averages — parts are grouped by broad categories (e.g., "seals") and assigned the category average demand, ignoring differences in operating context, equipment type, and failure mode
- Minimum stock rules — arbitrary safety stock levels are applied regardless of actual demand patterns, leading to either overstocking or stockouts
Each of these approaches treats parts as independent data series. If a part has insufficient data in its own history, the system has nowhere to look. This is the fundamental limitation the ontology-based approach addresses.
Paradigm Shift: From Rows to Relationships
Traditional forecasting operates on a rows-based paradigm: each part is an independent row of data. More data rows mean better forecasts. No data means no forecast. The ontology-based approach introduces a relationships-based paradigm: parts exist in a graph of structural and functional relationships, and a part with sparse data can borrow signal from ontologically similar parts.
The question shifts from "what did THIS part do?" to "what did SIMILAR parts do in SIMILAR contexts?" This reframing is what enables the platform to forecast the entire parts catalog — including the 80% that traditional methods cannot address.
Three-Agent Architecture
The forecasting system uses three specialized agents that collaborate to produce and validate predictions. Each agent has a distinct role and expertise.
| Agent | Role | Input | Output |
|---|---|---|---|
| Ontology Agent | Understands equipment structure, identifies peer parts, maps functional equivalencies | Target part identifier, equipment ontology graph | Ranked peer list with similarity weights (0.0–1.0) |
| Statistical Agent | Performs survival analysis and demand modeling using appropriate statistical methods | Peer list with weights, transaction histories for target and peers | Forecast parameters (Weibull shape/scale), confidence intervals |
| LLM Agent | Validates results against domain knowledge, explains forecasts, identifies anomalies | Statistical forecast, part metadata, operating conditions | Validated forecast with natural language explanation, anomaly flags |
Solution Architecture
The end-to-end forecasting pipeline orchestrates the three agents through a sequential workflow where each agent's output feeds the next.
The pipeline runs both on-demand (for individual part queries) and in batch mode (for periodic inventory planning across the entire parts catalog). Batch processing prioritizes parts by business criticality and data sparsity — parts with the least history and highest impact are forecasted first.
Peer Selection and Weighted Pooling
The Ontology Agent uses the equipment ontology to find peer parts through three relationship paths, as described in the ontology forecasting section.
Peer Discovery Paths
- Same-subsystem peers — other consumable parts in the same subsystem. Traverse
part-ofto the parent subsystem, then down to sibling components and their parts. Example: for a vacuum pump seal kit, peers include O-rings, gaskets, and lubricant cartridges in the same pump subsystem. - Same-function peers — parts with the same functional role in different equipment types. Traverse to components tagged with the same ontology function type. Example: seal kits in centrifugal pumps, diaphragm pumps, and rotary vane pumps.
- Same-environment peers — parts operating in similar conditions regardless of equipment type. Traverse through parameter relationships to find parts sharing operating condition tags. Example: corrosion-resistant seals across all equipment in high-humidity, chemical-exposure environments.
Similarity Scoring
Each peer is assigned a similarity weight (0.0–1.0) based on three factors:
| Factor | What It Measures | Effect on Weight |
|---|---|---|
| Ontological distance | Number of graph edges between target and peer | Shorter path = higher weight |
| Operational overlap | Similarity of operating conditions (temperature, duty cycle, chemical exposure) | More overlap = higher weight |
| Failure mode similarity | Whether the peer fails through the same mechanism (wear, corrosion, fatigue, thermal degradation) | Same mechanism = higher weight |
Weighted Pooling
Peer transaction histories are combined using weighted pooling: closer ontological peers contribute more to the pooled dataset than distant peers. The Statistical Agent uses these weights when fitting distribution parameters — a peer with weight 0.9 (same subsystem, same failure mode) has far more influence on the forecast than a peer with weight 0.3 (same general function but different operating environment).
This approach ensures that even with a large peer set, the forecast is dominated by the most relevant evidence rather than being diluted by loosely related data.
Statistical Modeling
The Statistical Agent applies Weibull survival analysis to the weighted peer-pooled data. Weibull distributions are used because they model the full lifecycle of mechanical and electronic components — from early-life failures through random failures to end-of-life wear-out.
Weibull Distribution Parameters
The Weibull distribution is characterized by two parameters:
| Parameter | Symbol | What It Indicates |
|---|---|---|
| Shape | β (beta) | Failure pattern: β < 1 = infant mortality (decreasing failure rate), β = 1 = random failures (constant rate), β > 1 = wear-out (increasing failure rate) |
| Scale | η (eta) | Characteristic life — the time by which 63.2% of units will have failed. This is the primary forecast output. |
Application to Sparse Data
For parts with some history (e.g., 3–10 replacement records), the Statistical Agent fits the Weibull distribution directly to the target part's data, using peer data as a Bayesian prior to constrain the fit. For parts with zero history, the agent relies entirely on weighted peer-pooled data. The peer weights ensure that the fitted distribution reflects the most ontologically similar parts rather than a generic average.
The output includes the forecast parameters (β and η), confidence intervals that reflect the uncertainty from sparse data, and the effective sample size (how much peer data contributed to the fit).
LLM Validation
The LLM Agent serves as a domain knowledge check on the statistical output. It reviews the forecast against what is physically and operationally plausible for the specific part type, material, and operating environment.
Validation Checks
- Physical plausibility — does the predicted lifecycle make sense for the material and operating conditions? A rubber seal in a high-temperature environment should not have a 20-year predicted life. A stainless steel component in a clean-room environment should not have a 6-month predicted life.
- Engineering factors — are there known factors not captured in the transaction data? Recall campaigns, design changes, operating condition changes, or material substitutions can shift failure patterns in ways that historical data alone does not reflect.
- Anomaly detection — does the forecast diverge significantly from what would be expected given the part type and peer set? Large divergences are flagged for human review rather than automatically accepted.
Forecast Explanation
The LLM Agent generates a natural language explanation of the forecast reasoning. This explanation includes the peer basis (which parts contributed and why), the statistical interpretation (what the Weibull parameters mean for this part type), and any caveats or conditions that should be considered. The explanation is designed for inventory planners and service engineers — not data scientists — so it uses domain terminology and focuses on actionable interpretation.
Why LLM validation matters: Statistical models produce mathematically valid results that may be operationally wrong. A model might predict a 10-year lifecycle for a component that was recently redesigned with a known weaker material. The LLM agent catches these cases by incorporating domain knowledge that the statistical model cannot access from transaction data alone.
Multi-Variable Forecasting
Beyond parts lifecycle forecasting, the platform supports multi-variable time series and regression-based predictions for broader forecasting scenarios such as financial spend forecasting and supply chain demand prediction.
Ensemble Methods
The platform's forecasting library includes multiple model families that can be evaluated and combined:
- Time series models — ARIMA/SARIMA, Prophet, neural forecasters for data with clear temporal patterns and seasonality
- Regression models — neural network regression, random forest, gradient boosting for problems where feature-based prediction outperforms time series decomposition
- Ensemble engine — combines outputs from multiple models using weighted averaging, where weights are determined by backtesting performance on holdout data
The system uses a backtesting framework that reserves a portion of historical data for validation, evaluates each model family independently, and selects the ensemble combination that minimizes prediction error on the holdout set.
Multi-Variable Lag Detection
For forecasting problems that involve external variables (e.g., equipment install base, economic indicators, construction activity), the system includes automatic lag detection. The lag detection algorithm tests each external variable at multiple time offsets to identify the delay between a change in the variable and its effect on the target metric.
For example, a change in equipment install base may take several quarters to affect consumable parts spending — the lag detection system identifies this offset and incorporates it into the multivariate model so that current install base data is used to predict future spending at the correct horizon.
Problem Classification
Not every prediction problem is a time series problem. The platform classifies forecasting requests by type and applies the appropriate methodology:
| Problem Type | Approach | Example |
|---|---|---|
| Time series | SARIMA, Prophet, neural forecasters with seasonality decomposition | Quarterly consumable parts spending with seasonal patterns |
| Regression | Feature-based prediction using gradient boosting, random forest, or neural regression | Predicting delivery dates from purchase order attributes |
| Survival analysis | Weibull distribution with ontology-based peer pooling | Parts replacement lifecycle forecasting for sparse-history items |
Case Study: Vacuum Pump Seal Kit
This case study illustrates the three-agent forecasting pipeline applied to a specific sparse-history part.
Starting Point
A vacuum pump seal kit had only 3 replacement records over 2 years. A traditional statistical forecast from this data alone produced a crude 1.5-year replacement cycle — an arithmetic mean with no statistical confidence and no understanding of the failure mode.
Ontology Agent Result
The Ontology Agent traversed the equipment ontology and identified 47 peer parts across 12 equipment models. Peers were found through same-subsystem paths (other vacuum pump consumables), same-function paths (seal kits in other pump types), and same-environment paths (corrosion-resistant seals across equipment in similar operating conditions). Each peer was scored for similarity based on ontological distance, operational overlap, and failure mode similarity.
Statistical Agent Result
Using the weighted peer-pooled transaction data (47 peers contributing weighted histories), the Statistical Agent fitted a Weibull distribution with the following parameters:
- Shape (β) = 2.1 — indicating a wear-out failure pattern (increasing failure rate over time), consistent with a consumable seal component
- Scale (η) = 5.7 years — the characteristic lifecycle, meaning 63.2% of these seal kits are expected to need replacement by 5.7 years of service
LLM Agent Validation
The LLM Agent reviewed the forecast and confirmed: "Consistent with fluorocarbon seal degradation in vacuum environments with periodic thermal cycling." The 5.7-year lifecycle was validated as physically plausible for the material type (fluorocarbon) in the operating environment (vacuum with thermal cycling). No anomalies were flagged.
Outcome
The ontology-based forecast produced a 5.7-year replacement cycle — compared to the 1.5-year crude average from direct history alone. This represents a fundamental difference in inventory planning: the 1.5-year estimate would trigger unnecessary safety stock replenishment, while the 5.7-year estimate aligns inventory levels to actual expected demand patterns derived from peer evidence.
The key insight: 3 records from 1 part produced a crude average. 47 ontologically similar peers — weighted by structural similarity, operating conditions, and failure mode — produced a statistically grounded Weibull distribution with a validated physical explanation. The ontology transforms sparse data from a forecasting dead-end into a tractable problem.