AleatorAI/OMNI-BLAS: High-performance probabilistic linear algebra library. 4x faster than NumPy/MKL on standard CPUs via Monte Carlo sampling.

“Precision through Probability.”

OMNI-BLAS is a next-generation linear algebra kernel developed by Aleator AI.
It is designed to replace standard BLAS routines (NumPy/MKL) for AI inference, huge-scale data analysis, and approximate computing tasks where speed is critical and strict FP32 precision is negotiable.

By abandoning deterministic brute-force methods in favor of Monte Carlo Outer-Product Sampling, OMNI achieves dramatic speedups on standard CPU hardware without requiring GPUs.

🚀 Performance Benchmarks

OMNI-BLAS was benchmarked against the industry standard (numpy.dot backed by Intel MKL) on consumer-grade hardware (Single-socket CPU).

Library	Operation	Matrix Size	Execution Time	Speedup Factor
NumPy (Standard)	FP32 MatMul	2000 x 2000	0.0661s	1.00x
OMNI-BLAS (Turbo)	FP32 MatMul	2000 x 2000	0.0161s	4.10x

Note: Performance gains scale linearly with matrix dimensionality. The theoretical maximum speedup approaches memory bandwidth limits.

OMNI is not a wrapper. It is a standalone engine written in C99 with OpenMP parallelism and SIMD intrinsics.

Standard matrix multiplication operates at $O(N^3)$ complexity. OMNI utilizes a stochastic approach to reduce effective complexity:

Stochastic Tiling: Instead of computing every element, OMNI samples high-energy columns/rows based on a cache-optimized distribution.
L1/L2 Cache Locking: The kernel is tuned to keep active working sets entirely within the CPU cache hierarchy, eliminating RAM latency bottlenecks.
Probabilistic Convergence: The result converges mathematically to the exact solution. The error rate is controllable via the speed parameter.

OMNI-BLAS is provided as a pre-compiled binary for evaluation purposes.

# Clone the repository
git clone https://github.com/AleatorAI/OMNI-BLAS.git

# Enter the directory
cd OMNI-BLAS

# Run the benchmark to verify 4x speedup on your machine
python benchmark.py

OMNI is designed as a drop-in replacement for numpy.dot.

import omni
import numpy as np

# Initialize massive datasets
A = np.random.rand(2000, 2000).astype(np.float32)
B = np.random.rand(2000, 2000).astype(np.float32)

# STANDARD NUMPY (Slow)
# C = np.dot(A, B)

# OMNI-BLAS (Fast)
# speed=1.0 : Balanced Mode (High Accuracy)
# speed=2.0 : Turbo Mode (Max Speed, ~5% Sampling)
C = omni.dot(A, B, speed=2.0)

print("Calculation complete.")

⚠️ Limitations & Use Cases

OMNI-BLAS is an Approximate Computing engine. It is not suitable for cryptographic calculations or financial accounting where 100% precision is mandatory.

Ideal Use Cases:

Neural Network Inference: Deep Learning weights are robust to minor noise. OMNI acts as a CPU-based accelerator.
Big Data Clustering: K-Means and PCA algorithms on massive datasets.
Real-Time Graphics/Physics: Where frame rate > precision.

💼 Licensing & Commercial

The binaries (.dll / .so) provided in this repository are for Non-Commercial / Academic Evaluation only.

Aleator AI retains full ownership of the source code and the underlying “Monte Carlo Outer-Product” algorithm.

For Enterprise Licensing, Source Code Acquisition, or integration into commercial AI pipelines, please contact:

📧 Contact: aleator.ai.labs@gmail.com (or via GitHub Issues)

Source link

🚀 Performance Benchmarks

⚠️ Limitations & Use Cases

💼 Licensing & Commercial

Leave a Reply Cancel reply