Axiom Benchmark Results

Generated: 2026-02-21 16:02

Platform: Darwin arm64

Contents


Summary

Comprehensive performance comparison across tensor operations.

Comprehensive Summary


Matrix Multiplication

Performance comparison for square matrix multiplication (GFLOPS, higher is better).

Size

Axiom

Eigen3

PyTorch

NumPy

Armadillo

32×32

55.2

88.4

62.2

56.4

98.3

64×64

371

440

336

332

42.1

128×128

923

954

930

951

44.7

256×256

1,421

1,251

1,412

1,488

162

512×512

2,389

2,345

1,310

2,358

434

1024×1024

2,820

2,445

2,423

2,299

524

2048×2048

3,218

2,982

2,801

2,795

608

4096×4096

3,087

2,961

2,961

2,959

754

Performance Comparison

Matmul Comparison

Scaling Analysis

Matmul Scaling


Element-wise Operations

Binary element-wise operations (add, sub, mul, div) measured in GB/s throughput.

Results at 4096×4096 (GB/s)

Operation

Axiom

Eigen3

PyTorch

NumPy

add

92.4

121

94.2

40.0

sub

112

119

90.4

40.5

mul

117

119

96.1

42.9

div

99.1

120

95.4

41.2

Performance by Operation

Elementwise Comparison

Bar Chart Comparison

Elementwise Bar


Unary Operations

Unary operations (exp, log, sqrt, sin, cos, tanh, abs, neg, relu, sigmoid) measured in GB/s.

Results at 4096×4096 (GB/s)

Operation

Axiom

Eigen3

PyTorch

NumPy

exp

23.0

16.4

50.7

5.69

log

17.0

13.1

33.7

4.93

sqrt

39.0

66.3

73.2

29.7

sin

26.0

11.0

39.1

6.73

cos

25.2

11.0

33.7

6.46

tanh

14.3

21.5

21.0

9.43

abs

104

118

75.1

27.3

neg

109

66.1

75.4

32.7

relu

121

122

74.2

18.5

sigmoid

14.3

15.9

47.5

3.71

Performance by Operation

Unary Comparison

Bar Chart Comparison

Unary Bar


Linear Algebra

Linear algebra operations (SVD, QR, solve, Cholesky, eigendecomposition, inverse, determinant). Measured in milliseconds (lower is better).

Results at 512×512 (time_ms)

Operation

Axiom

Eigen3

PyTorch

NumPy

svd

18.2

2,200

16.2

25.9

qr

4.72

1.50

4.06

7.89

solve

0.96

2.18

0.47

1.20

cholesky

0.69

0.24

0.22

1.43

eig

151

22.5

9.50

15.4

inv

1.55

2.34

1.06

3.83

det

1.01

1.46

0.57

1.69

Performance by Operation

Linalg Comparison

Bar Chart Comparison

Linalg Bar


FFT Operations

Fast Fourier Transform operations (fft, ifft, rfft, fft2, ifft2, rfft2). Measured in milliseconds (lower is better).

Results at 2048×2048 (time_ms)

Operation

Axiom

PyTorch

NumPy

fft

0.00

0.01

0.01

ifft

0.00

0.01

0.01

rfft

0.00

0.01

0.01

fft2

14.3

27.2

60.6

ifft2

14.3

27.6

29.6

rfft2

10.0

7.82

22.8

Performance by Operation

FFT Comparison

Bar Chart Comparison

FFT Bar


Fusion Patterns

Lazy evaluation with operation fusion vs eager mode execution.

Run make benchmark-fusion to generate fusion data.


Test Environment

OS: Darwin 25.3.0
Architecture: arm64
Python: 3.12.7
Timestamp: 2026-02-21T15:56:58.080886

Notes

  • All benchmarks run on CPU

  • Axiom uses Accelerate framework (BLAS) on macOS

  • Higher GFLOPS/GB/s = better for throughput metrics

  • Lower ms = better for time metrics

  • Results may vary based on system load and thermal conditions