Axiom Benchmark Results¶
Generated: 2026-02-21 16:02
Platform: Darwin arm64
Contents¶
Summary¶
Comprehensive performance comparison across tensor operations.

Matrix Multiplication¶
Performance comparison for square matrix multiplication (GFLOPS, higher is better).
Size |
Axiom |
Eigen3 |
PyTorch |
NumPy |
Armadillo |
|---|---|---|---|---|---|
32×32 |
55.2 |
88.4 |
62.2 |
56.4 |
98.3 |
64×64 |
371 |
440 |
336 |
332 |
42.1 |
128×128 |
923 |
954 |
930 |
951 |
44.7 |
256×256 |
1,421 |
1,251 |
1,412 |
1,488 |
162 |
512×512 |
2,389 |
2,345 |
1,310 |
2,358 |
434 |
1024×1024 |
2,820 |
2,445 |
2,423 |
2,299 |
524 |
2048×2048 |
3,218 |
2,982 |
2,801 |
2,795 |
608 |
4096×4096 |
3,087 |
2,961 |
2,961 |
2,959 |
754 |
Performance Comparison¶

Scaling Analysis¶

Element-wise Operations¶
Binary element-wise operations (add, sub, mul, div) measured in GB/s throughput.
Results at 4096×4096 (GB/s)
Operation |
Axiom |
Eigen3 |
PyTorch |
NumPy |
|---|---|---|---|---|
add |
92.4 |
121 |
94.2 |
40.0 |
sub |
112 |
119 |
90.4 |
40.5 |
mul |
117 |
119 |
96.1 |
42.9 |
div |
99.1 |
120 |
95.4 |
41.2 |
Performance by Operation¶

Bar Chart Comparison¶

Unary Operations¶
Unary operations (exp, log, sqrt, sin, cos, tanh, abs, neg, relu, sigmoid) measured in GB/s.
Results at 4096×4096 (GB/s)
Operation |
Axiom |
Eigen3 |
PyTorch |
NumPy |
|---|---|---|---|---|
exp |
23.0 |
16.4 |
50.7 |
5.69 |
log |
17.0 |
13.1 |
33.7 |
4.93 |
sqrt |
39.0 |
66.3 |
73.2 |
29.7 |
sin |
26.0 |
11.0 |
39.1 |
6.73 |
cos |
25.2 |
11.0 |
33.7 |
6.46 |
tanh |
14.3 |
21.5 |
21.0 |
9.43 |
abs |
104 |
118 |
75.1 |
27.3 |
neg |
109 |
66.1 |
75.4 |
32.7 |
relu |
121 |
122 |
74.2 |
18.5 |
sigmoid |
14.3 |
15.9 |
47.5 |
3.71 |
Performance by Operation¶

Bar Chart Comparison¶

Linear Algebra¶
Linear algebra operations (SVD, QR, solve, Cholesky, eigendecomposition, inverse, determinant). Measured in milliseconds (lower is better).
Results at 512×512 (time_ms)
Operation |
Axiom |
Eigen3 |
PyTorch |
NumPy |
|---|---|---|---|---|
svd |
18.2 |
2,200 |
16.2 |
25.9 |
qr |
4.72 |
1.50 |
4.06 |
7.89 |
solve |
0.96 |
2.18 |
0.47 |
1.20 |
cholesky |
0.69 |
0.24 |
0.22 |
1.43 |
eig |
151 |
22.5 |
9.50 |
15.4 |
inv |
1.55 |
2.34 |
1.06 |
3.83 |
det |
1.01 |
1.46 |
0.57 |
1.69 |
Performance by Operation¶

Bar Chart Comparison¶

FFT Operations¶
Fast Fourier Transform operations (fft, ifft, rfft, fft2, ifft2, rfft2). Measured in milliseconds (lower is better).
Results at 2048×2048 (time_ms)
Operation |
Axiom |
PyTorch |
NumPy |
|---|---|---|---|
fft |
0.00 |
0.01 |
0.01 |
ifft |
0.00 |
0.01 |
0.01 |
rfft |
0.00 |
0.01 |
0.01 |
fft2 |
14.3 |
27.2 |
60.6 |
ifft2 |
14.3 |
27.6 |
29.6 |
rfft2 |
10.0 |
7.82 |
22.8 |
Performance by Operation¶

Bar Chart Comparison¶

Fusion Patterns¶
Lazy evaluation with operation fusion vs eager mode execution.
Run make benchmark-fusion to generate fusion data.
Test Environment¶
OS: Darwin 25.3.0
Architecture: arm64
Python: 3.12.7
Timestamp: 2026-02-21T15:56:58.080886
Notes¶
All benchmarks run on CPU
Axiom uses Accelerate framework (BLAS) on macOS
Higher GFLOPS/GB/s = better for throughput metrics
Lower ms = better for time metrics
Results may vary based on system load and thermal conditions