Certified Matrix Decompositions
BallArithmetic.jl provides rigorous certification for several matrix decompositions. Each method computes both a numerical approximation and a rigorous error bound.
Overview
| Decomposition | Function | Description |
|---|---|---|
| SVD | ogita_svd_refine | Singular value decomposition |
| SVD Cascade | ogita_svd_cascade | Multi-precision SVD cascade |
| SVD (GLA) | ogita_svd_cascade_gla | Native BigFloat SVD |
| Schur | iterative_schur_refinement | Schur form |
| LU | verified_lu | LU factorization |
| Cholesky | verified_cholesky | Cholesky factorization |
| QR | verified_qr | QR factorization |
| Polar | verified_polar | Polar decomposition |
| Takagi | verified_takagi | Takagi factorization |
Recommended Extensions
BallArithmetic supports several extension packages that provide different performance characteristics:
| Extension | Package | Use Case |
|---|---|---|
| GenericLinearAlgebra | GenericLinearAlgebra.jl | Recommended - Native BigFloat SVD, fastest and most accurate |
| MultiFloats | MultiFloats.jl | SIMD-accelerated multi-precision (Float64x2, Float64x4) |
| DoubleFloats | DoubleFloats.jl | Fast Double64 arithmetic (~106 bits) |
SVD Certification Methods
The SVD is fundamental for resolvent norm certification and condition number estimation. We provide multiple methods with different speed/accuracy trade-offs.
Method Comparison
All benchmarks performed on:
- CPU: AMD Ryzen 5 5600 6-Core Processor
- RAM: 128 GB
- Julia: 1.11+
- BigFloat precision: 256 bits
Small Matrix (100×100)
| Method | Time | Non-rigorous Residual | Rigorous σ_min Bound | Speedup |
|---|---|---|---|---|
| GLA (no refine) | 4.2s | 4×10⁻⁷⁴ | ✓ | 6.6× |
| D64 cascade (F64→D64→BF) | 11.7s | 1×10⁻¹² | ✓ | 2.4× |
| Minimal cascade (F64→MF3→BF) | 12.2s | 1×10⁻¹² | ✓ | 2.3× |
| Full cascade (F64→D64→MF3→MF4→BF) | 15.0s | 1×10⁻¹³ | ✓ | 1.8× |
| Pure Ogita (F64→BF×5) | 27.7s | 1×10⁻¹² | ✓ | 1.0× |
Key observations:
- GLA achieves 60+ orders of magnitude better residual than cascade methods
- GLA is 6.6× faster than pure Ogita refinement
- All methods produce valid rigorous bounds for σ_min
- Simpler cascades (D64 only) outperform complex multi-level cascades
Medium Matrix (200×200)
| Method | Time | Non-rigorous Residual | Rigorous Bound | Speedup |
|---|---|---|---|---|
| Cascade (F64→D64→MF3→MF4→BF×2) | 196s | 3.3×10⁻¹² | ✓ | 2.1× |
| Pure Ogita (F64→BF×5) | 407s | 5.0×10⁻¹² | ✓ | 1.0× |
Large Matrix (500×500)
| Method | Time | Non-rigorous Residual | Rigorous Bound | Speedup |
|---|---|---|---|---|
| Cascade (F64→D64→MF3→MF4→BF×2) | 3234s | 1.4×10⁻¹¹ | ✓ | 1.9× |
| Pure Ogita (F64→BF×5) | 6279s | 1.6×10⁻¹¹ | ✓ | 1.0× |
Recommendations
Use GenericLinearAlgebra when available:
using BallArithmetic, GenericLinearAlgebra T_bf = Complex{BigFloat}.(T) z_bf = Complex{BigFloat}(z) result = ogita_svd_cascade_gla(T_bf, z_bf) # result.σ_min is a rigorous lower bound # result.residual_norm is the non-rigorous residualWithout GLA, use the simple D64 cascade:
using BallArithmetic, MultiFloats, DoubleFloats result = ogita_svd_cascade(T_bf, z_bf; f64_iters=1, d64_iters=1, mf3_iters=0, mf4_iters=0, bf_iters=2)For pure BigFloat (no extensions):
using BallArithmetic result = ogita_svd_refine(A, U, Σ, V; max_iterations=5)
Rigorous vs Non-Rigorous Results
Each decomposition method returns:
- Non-rigorous residual: The computed
‖A - UΣVᴴ‖_Fusing standard floating-point - Rigorous bound: A mathematically guaranteed upper bound on the true error
The rigorous bound accounts for:
- Rounding errors in all computations
- Potential instabilities in the algorithm
- Accumulated error from multiple operations
For certification purposes, always use the rigorous bound.
Other Matrix Decompositions (100×100 Matrix)
Beyond SVD, BallArithmetic provides verified versions of standard matrix decompositions.
Standard (Float64 → BigFloat)
These use Float64 computation with BigFloat certification:
| Decomposition | Time | Residual Norm | Rigorous Bound |
|---|---|---|---|
| Cholesky (SPD) | 1.7s | 2.7×10⁻¹⁶ | ✓ |
| QR | 2.1s | 3.1×10⁻¹⁵ | ✓ |
| Polar | 3.3s | 4.8×10⁻¹⁶ | ✓ |
| LU | 5.6s | 1.3×10⁻¹⁴ | ✓ |
| Takagi | 10.3s | 1.7×10⁻¹⁵ | ✓ |
GLA-Based (Native BigFloat)
With GenericLinearAlgebra, we achieve 60+ orders of magnitude better residuals:
| Decomposition | Time | Residual Norm | Improvement |
|---|---|---|---|
| Cholesky (GLA) | 0.55s | 5.2×10⁻⁷⁴ | 10⁵⁸× better |
| LU (GLA) | 0.91s | 5.9×10⁻⁷⁵ | 10⁶¹× better |
| QR (GLA) | 0.91s | 5.4×10⁻⁷⁵ | 10⁶⁰× better |
| SVD (GLA) | 5.15s | 4.5×10⁻⁷⁴ | 10⁶⁰× better |
GLA functions:
using BallArithmetic, GenericLinearAlgebra
result_lu = verified_lu_gla(A) # ~10⁻⁷⁵ residual
result_qr = verified_qr_gla(A) # ~10⁻⁷⁵ residual
result_chol = verified_cholesky_gla(A_spd) # ~10⁻⁷⁴ residual
(U, S, V, res) = verified_svd_gla(A) # ~10⁻⁷⁴ residualUsage examples:
using BallArithmetic
using LinearAlgebra
A = randn(100, 100) + 5I
# LU decomposition
result_lu = verified_lu(A)
# result_lu.L, result_lu.U are BallMatrix enclosures
# result_lu.residual_norm is the rigorous error bound
# QR decomposition
result_qr = verified_qr(A)
# result_qr.Q, result_qr.R are BallMatrix enclosures
# result_qr.orthogonality_defect bounds ‖QᴴQ - I‖
# Cholesky (for symmetric positive definite)
A_spd = A'A + 100I
result_chol = verified_cholesky(A_spd)
# result_chol.L is the lower Cholesky factor
# Polar decomposition A = UH
result_polar = verified_polar(A)
# result_polar.U (unitary), result_polar.H (Hermitian positive semidefinite)
# Takagi factorization (for complex symmetric)
A_symm = A + transpose(A)
result_takagi = verified_takagi(A_symm)
# A = U Σ Uᵀ (not Uᴴ!)Iterative Refinement Methods
For decompositions not supported by GLA, we provide iterative refinement using extended precision arithmetic.
Float64 Iterative Refinement (5 iterations)
| Decomposition | Time | Residual | Orthogonality Defect |
|---|---|---|---|
| Cholesky | 0.97s | 8.9×10⁻¹⁷ | N/A |
| QR (CholQR2) | 0.12s | 6.8×10⁻¹⁶ | 2.8×10⁻¹⁵ |
| LU | 0.96s | 1.5×10⁻¹⁵ | N/A |
| Polar (Newton-Schulz) | 0.22s | 1.4×10⁻¹⁵ | 2.1×10⁻¹⁵ |
Double64 Iterative Refinement (5 iterations)
Double64 provides ~106 bits of precision, dramatically improving orthogonality:
| Decomposition | Time | Residual | Orthogonality Defect |
|---|---|---|---|
| Cholesky | 0.90s | 7.2×10⁻¹⁸ | N/A |
| QR (CholQR2) | 1.69s | 7.1×10⁻¹⁶ | 2.1×10⁻³¹ |
| LU | 4.53s | 1.2×10⁻¹⁵ | N/A |
| Polar (Newton-Schulz) | 2.76s | 1.3×10⁻¹⁵ | 2.0×10⁻³¹ |
Key observation: Double64 achieves 16 orders of magnitude better orthogonality for QR and Polar decompositions, while residuals remain similar (limited by Float64 input precision).
Usage:
using BallArithmetic, DoubleFloats, LinearAlgebra
A = randn(100, 100) + 5I
F_qr = qr(A)
# Float64 refinement
result_f64 = refine_qr_cholqr2(A, Matrix(F_qr.Q), Matrix(F_qr.R); max_iterations=5)
# Double64 refinement (better orthogonality)
result_d64 = refine_qr_double64(A, Matrix(F_qr.Q), Matrix(F_qr.R);
method=:cholqr2, max_iterations=5)
# Polar refinement
F_svd = svd(A)
Q0 = F_svd.U * F_svd.Vt
result_polar = refine_polar_double64(A, Q0; method=:newton_schulz, max_iterations=10)Schur Complement Bounds (Oishi 2023 / Rump-Oishi 2024)
For block-structured matrices, the Schur complement method provides efficient σ_min bounds:
result = rump_oishi_2024_sigma_min_bound(G; block_size=m)This implements:
- Oishi 2023: Base Schur complement algorithm
- Rump-Oishi 2024: Improved ψ(N) formula that works for ‖N‖ ≥ 1
Key advantages:
- Works on matrices too large for full SVD
- Exploits block diagonal dominance structure
- Automatic optimal block size selection
References
- Ogita, T. & Aishima, K. (2020), "Iterative refinement for singular value decomposition based on matrix multiplication"
- Rump, S.M. & Ogita, S. (2024), "A Note on Oishi's Lower Bound for the Smallest Singular Value of Linearized Galerkin Equations"
- Oishi, S. (2023), "Lower bounds for the smallest singular values of generalized asymptotic diagonal dominant matrices"