“Precision through Probability.”
OMNI-BLAS is a next-generation linear algebra kernel developed by Aleator AI.
It is designed to replace standard BLAS routines (NumPy/MKL) for AI inference, huge-scale data analysis, and approximate computing tasks where speed is critical and strict FP32 precision is negotiable.
By abandoning deterministic brute-force methods in favor of Monte Carlo Outer-Product Sampling, OMNI achieves dramatic speedups on standard CPU hardware without requiring GPUs.
OMNI-BLAS was benchmarked against the industry standard (numpy.dot backed by Intel MKL) on consumer-grade hardware (Single-socket CPU).
| Library | Operation | Matrix Size | Execution Time | Speedup Factor |
|---|---|---|---|---|
| NumPy (Standard) | FP32 MatMul | 2000 x 2000 | 0.0661s | 1.00x |
| OMNI-BLAS (Turbo) | FP32 MatMul | 2000 x 2000 | 0.0161s | 4.10x |
Note: Performance gains scale linearly with matrix dimensionality. The theoretical maximum speedup approaches memory bandwidth limits.
OMNI is not a wrapper. It is a standalone engine written in C99 with OpenMP parallelism and SIMD intrinsics.
Standard matrix multiplication operates at
- Stochastic Tiling: Instead of computing every element, OMNI samples high-energy columns/rows based on a cache-optimized distribution.
- L1/L2 Cache Locking: The kernel is tuned to keep active working sets entirely within the CPU cache hierarchy, eliminating RAM latency bottlenecks.
- Probabilistic Convergence: The result converges mathematically to the exact solution. The error rate is controllable via the
speedparameter.
OMNI-BLAS is provided as a pre-compiled binary for evaluation purposes.
# Clone the repository
git clone https://github.com/AleatorAI/OMNI-BLAS.git
# Enter the directory
cd OMNI-BLAS
# Run the benchmark to verify 4x speedup on your machine
python benchmark.py
OMNI is designed as a drop-in replacement for numpy.dot.
import omni
import numpy as np
# Initialize massive datasets
A = np.random.rand(2000, 2000).astype(np.float32)
B = np.random.rand(2000, 2000).astype(np.float32)
# STANDARD NUMPY (Slow)
# C = np.dot(A, B)
# OMNI-BLAS (Fast)
# speed=1.0 : Balanced Mode (High Accuracy)
# speed=2.0 : Turbo Mode (Max Speed, ~5% Sampling)
C = omni.dot(A, B, speed=2.0)
print("Calculation complete.")
OMNI-BLAS is an Approximate Computing engine. It is not suitable for cryptographic calculations or financial accounting where 100% precision is mandatory.
Ideal Use Cases:
- Neural Network Inference: Deep Learning weights are robust to minor noise. OMNI acts as a CPU-based accelerator.
- Big Data Clustering: K-Means and PCA algorithms on massive datasets.
- Real-Time Graphics/Physics: Where frame rate > precision.
The binaries (.dll / .so) provided in this repository are for Non-Commercial / Academic Evaluation only.
Aleator AI retains full ownership of the source code and the underlying “Monte Carlo Outer-Product” algorithm.
For Enterprise Licensing, Source Code Acquisition, or integration into commercial AI pipelines, please contact:
📧 Contact: aleator.ai.labs@gmail.com (or via GitHub Issues)
Copyright © 2026 Aleator AI. All Rights Reserved.