The Open-Source Stack for Decision Intelligence
Probabilistic models are integral to state-of-the-art decision theories, such as Bayesian a.k.a. evidential, quantum-like Bayesian, causal, and functional/updateless/timeless decision theories. To enable widespread and inclusive adoption of these decision theories by businesses and in public service, there is a pressing need for an open-source decision intelligence stack. This stack should span from sensing (data collection) and actuation to online learning and natural-language decision assistance.
The Decision Intelligence Stack
The proposed decision intelligence (DI) stack can be broken down into the following levels:
Sensors and edge devices: cameras, wearables, etc.
IoT device management, data collection and processing systems, and databases: MQTT brokers/servers, log collectors, change data capture, Airbyte, Kafka, Flink, data lake storage, Spark, Ray, ClickHouse, Rockset, Postgres, etc.
ML computing frameworks: PyTorch, JAX/Flax
Probabilistic ML algorithms for time-series and other forms of data: probabilistic state-space models, (Bayesian) RNNs, regression trees, VAEs, GNNs. Recent libraries like dynamax and PyMC-BART are filling the gap in perpetually developed and supported implementations of these algorithms.
DNN (mechanistic) interpretability libraries and frameworks: needed for communicating inferences to humans (via LLMs). Currently non-existent, to the best of my knowledge.
Frameworks for probabilistic inference over a system of components: Stan, PyMC, NumPyro. They support various approximate inference methods, including sampling-based, variational, particle-based, and GFlowNet-based methods. See Štrumbelj et al. (2024) for an overview.
MLOps frameworks for ML and probabilistic inference: Airflow-like tools integrated with PyMC or NumPyro, automatically scheduling online learning tasks when new observations are available (a-la rebayes), and updating past states via retrospective inference.
Hyperparameter optimization/AutoML frameworks: Optuna
Causal discovery and inference libraries: causal-learn, DoWhy, CausalPy
Libraries of domain-specific models and probabilistic program templates: PyMC-Marketing, BatteryML
Baseline knowledge bases for augmenting proprietary models: Open Research Knowledge Graph, system.com
Data catalog/metadata storage to help LLMs generate probabilistic model code: OpenMetadata
LLM to generate and modify probabilistic model code, using previous model versions, results, domain knowledge, data catalogs, and public knowledge as inputs: PyMC-GPT is a first step in this direction.
UI to visualize and explore data, causal models (applying DNN interpretability if needed), and counterfactual predictions. Current options include Python visualization libraries (matplotlib, seaborn, etc.), Graphviz-style causal graph visualization, LIDA for LLM-generated visualizations, Rill for data exploration, and Dara for causal-graph-aware visual app building.
API frontend for causal inference: frameworks like BentoML
Decision-making LLM that uses the (discovered) causal graph for LLM+KG reasoning and causal inference API as a tool (Toolformer-style): DeLLMa
Challenges and Opportunities
While some key layers in this stack are well-populated with mature tools, many others have few actively developed options. Apart from established data processing and ML components not specific to causal inference, the layers are rarely integrated with each other.
The depth and complexity of the stack can be daunting, and it's likely that no organization in the world has implemented it in an approximately complete form. As argued by causaLens, it's currently impractical for organizations to weave together a stack of open-source tools for end-to-end causal inference instead of opting for a vertically integrated solution. However, this highlights the need for an accessible and well-integrated open-source stack.
Given the recent consolidation trend in the software industry, the focus should be on good integration and user experience with an opinionated, general-purpose combination of components and algorithms, rather than on configurability and flexibility. Both the PyWhy and PyMC ecosystems, the most notable probabilistic and causal inference ecosystems in Python today, may be focusing too much on implementing diverse algorithms instead of polishing a vertically integrated experience.
Note that choosing specific components for the stack doesn’t mean cutting off possible use cases: the stack should remain broad in capability. Rather, choosing components is about removing the choice among functionally equivalent options, such as databases supporting the same query patterns, or probabilistic inference algorithms achieving approximately the same result through different means.
While the DI stack itself should be as general and broad-capability as possible, the go-to-market strategy for propelling it should initially focus on one application domain at a time. Infrastructure and application monitoring (and cost management) is an interesting choice as one of the first such domains, currently dominated by expensive SaaS platforms (Datadog, New Relic, Splunk) with a few dynamic open-source challengers (SigNoz, Highlight, Coroot). Many of the higher levels of the DI stack could be omitted for infrastructure monitoring, and non-parametric models are sufficient for infrastructure performance monitoring. SigNoz and Highlight also notably share the “opinionated integration philosophy” that I recommended for the DI stack above.