Certified Matrix Decompositions

BallArithmetic.jl provides rigorous certification for several matrix decompositions. Each method computes both a numerical approximation and a rigorous error bound.

Overview

Decomposition	Function	Description
SVD	`ogita_svd_refine`	Singular value decomposition
SVD Cascade	`ogita_svd_cascade`	Multi-precision SVD cascade
SVD (GLA)	`ogita_svd_cascade_gla`	Native BigFloat SVD
Schur	`iterative_schur_refinement`	Schur form
LU	`verified_lu`	LU factorization
Cholesky	`verified_cholesky`	Cholesky factorization
QR	`verified_qr`	QR factorization
Polar	`verified_polar`	Polar decomposition
Takagi	`verified_takagi`	Takagi factorization

Recommended Extensions

BallArithmetic supports several extension packages that provide different performance characteristics:

Extension	Package	Use Case
GenericLinearAlgebra	`GenericLinearAlgebra.jl`	Recommended - Native BigFloat SVD, fastest and most accurate
MultiFloats	`MultiFloats.jl`	SIMD-accelerated multi-precision (Float64x2, Float64x4)
DoubleFloats	`DoubleFloats.jl`	Fast Double64 arithmetic (~106 bits)

SVD Certification Methods

The SVD is fundamental for resolvent norm certification and condition number estimation. We provide multiple methods with different speed/accuracy trade-offs.

Method Comparison

All benchmarks performed on:

CPU: AMD Ryzen 5 5600 6-Core Processor
RAM: 128 GB
Julia: 1.11+
BigFloat precision: 256 bits

Small Matrix (100×100)

Method	Time	Non-rigorous Residual	Rigorous σ_min Bound	Speedup
GLA (no refine)	4.2s	4×10⁻⁷⁴	✓	6.6×
D64 cascade (F64→D64→BF)	11.7s	1×10⁻¹²	✓	2.4×
Minimal cascade (F64→MF3→BF)	12.2s	1×10⁻¹²	✓	2.3×
Full cascade (F64→D64→MF3→MF4→BF)	15.0s	1×10⁻¹³	✓	1.8×
Pure Ogita (F64→BF×5)	27.7s	1×10⁻¹²	✓	1.0×

Key observations:

GLA achieves 60+ orders of magnitude better residual than cascade methods
GLA is 6.6× faster than pure Ogita refinement
All methods produce valid rigorous bounds for σ_min
Simpler cascades (D64 only) outperform complex multi-level cascades

Medium Matrix (200×200)

Method	Time	Non-rigorous Residual	Rigorous Bound	Speedup
Cascade (F64→D64→MF3→MF4→BF×2)	196s	3.3×10⁻¹²	✓	2.1×
Pure Ogita (F64→BF×5)	407s	5.0×10⁻¹²	✓	1.0×

Large Matrix (500×500)

Method	Time	Non-rigorous Residual	Rigorous Bound	Speedup
Cascade (F64→D64→MF3→MF4→BF×2)	3234s	1.4×10⁻¹¹	✓	1.9×
Pure Ogita (F64→BF×5)	6279s	1.6×10⁻¹¹	✓	1.0×

Recommendations

Use GenericLinearAlgebra when available:

using BallArithmetic, GenericLinearAlgebra

T_bf = Complex{BigFloat}.(T)
z_bf = Complex{BigFloat}(z)
result = ogita_svd_cascade_gla(T_bf, z_bf)

# result.σ_min is a rigorous lower bound
# result.residual_norm is the non-rigorous residual

Without GLA, use the simple D64 cascade:

using BallArithmetic, MultiFloats, DoubleFloats

result = ogita_svd_cascade(T_bf, z_bf;
    f64_iters=1, d64_iters=1, mf3_iters=0, mf4_iters=0, bf_iters=2)

For pure BigFloat (no extensions):

using BallArithmetic

result = ogita_svd_refine(A, U, Σ, V; max_iterations=5)

Rigorous vs Non-Rigorous Results

Each decomposition method returns:

Non-rigorous residual: The computed ‖A - UΣVᴴ‖_F using standard floating-point
Rigorous bound: A mathematically guaranteed upper bound on the true error

The rigorous bound accounts for:

Rounding errors in all computations
Potential instabilities in the algorithm
Accumulated error from multiple operations

For certification purposes, always use the rigorous bound.

Other Matrix Decompositions (100×100 Matrix)

Beyond SVD, BallArithmetic provides verified versions of standard matrix decompositions.

Standard (Float64 → BigFloat)

These use Float64 computation with BigFloat certification:

Decomposition	Time	Residual Norm	Rigorous Bound
Cholesky (SPD)	1.7s	2.7×10⁻¹⁶	✓
QR	2.1s	3.1×10⁻¹⁵	✓
Polar	3.3s	4.8×10⁻¹⁶	✓
LU	5.6s	1.3×10⁻¹⁴	✓
Takagi	10.3s	1.7×10⁻¹⁵	✓

GLA-Based (Native BigFloat)

With GenericLinearAlgebra, we achieve 60+ orders of magnitude better residuals:

Decomposition	Time	Residual Norm	Improvement
Cholesky (GLA)	0.55s	5.2×10⁻⁷⁴	10⁵⁸× better
LU (GLA)	0.91s	5.9×10⁻⁷⁵	10⁶¹× better
QR (GLA)	0.91s	5.4×10⁻⁷⁵	10⁶⁰× better
SVD (GLA)	5.15s	4.5×10⁻⁷⁴	10⁶⁰× better

GLA functions:

using BallArithmetic, GenericLinearAlgebra

result_lu = verified_lu_gla(A)        # ~10⁻⁷⁵ residual
result_qr = verified_qr_gla(A)        # ~10⁻⁷⁵ residual
result_chol = verified_cholesky_gla(A_spd)  # ~10⁻⁷⁴ residual
(U, S, V, res) = verified_svd_gla(A)  # ~10⁻⁷⁴ residual

Usage examples:

using BallArithmetic
using LinearAlgebra

A = randn(100, 100) + 5I

# LU decomposition
result_lu = verified_lu(A)
# result_lu.L, result_lu.U are BallMatrix enclosures
# result_lu.residual_norm is the rigorous error bound

# QR decomposition
result_qr = verified_qr(A)
# result_qr.Q, result_qr.R are BallMatrix enclosures
# result_qr.orthogonality_defect bounds ‖QᴴQ - I‖

# Cholesky (for symmetric positive definite)
A_spd = A'A + 100I
result_chol = verified_cholesky(A_spd)
# result_chol.L is the lower Cholesky factor

# Polar decomposition A = UH
result_polar = verified_polar(A)
# result_polar.U (unitary), result_polar.H (Hermitian positive semidefinite)

# Takagi factorization (for complex symmetric)
A_symm = A + transpose(A)
result_takagi = verified_takagi(A_symm)
# A = U Σ Uᵀ (not Uᴴ!)

For decompositions not supported by GLA, we provide iterative refinement using extended precision arithmetic.

Decomposition	Time	Residual	Orthogonality Defect
Cholesky	0.97s	8.9×10⁻¹⁷	N/A
QR (CholQR2)	0.12s	6.8×10⁻¹⁶	2.8×10⁻¹⁵
LU	0.96s	1.5×10⁻¹⁵	N/A
Polar (Newton-Schulz)	0.22s	1.4×10⁻¹⁵	2.1×10⁻¹⁵

Double64 provides ~106 bits of precision, dramatically improving orthogonality:

Decomposition	Time	Residual	Orthogonality Defect
Cholesky	0.90s	7.2×10⁻¹⁸	N/A
QR (CholQR2)	1.69s	7.1×10⁻¹⁶	2.1×10⁻³¹
LU	4.53s	1.2×10⁻¹⁵	N/A
Polar (Newton-Schulz)	2.76s	1.3×10⁻¹⁵	2.0×10⁻³¹

Key observation: Double64 achieves 16 orders of magnitude better orthogonality for QR and Polar decompositions, while residuals remain similar (limited by Float64 input precision).

Usage:

using BallArithmetic, DoubleFloats, LinearAlgebra

A = randn(100, 100) + 5I
F_qr = qr(A)

# Float64 refinement
result_f64 = refine_qr_cholqr2(A, Matrix(F_qr.Q), Matrix(F_qr.R); max_iterations=5)

# Double64 refinement (better orthogonality)
result_d64 = refine_qr_double64(A, Matrix(F_qr.Q), Matrix(F_qr.R);
    method=:cholqr2, max_iterations=5)

# Polar refinement
F_svd = svd(A)
Q0 = F_svd.U * F_svd.Vt
result_polar = refine_polar_double64(A, Q0; method=:newton_schulz, max_iterations=10)

Schur Complement Bounds (Oishi 2023 / Rump-Oishi 2024)

For block-structured matrices, the Schur complement method provides efficient σ_min bounds:

result = rump_oishi_2024_sigma_min_bound(G; block_size=m)

This implements:

Oishi 2023: Base Schur complement algorithm
Rump-Oishi 2024: Improved ψ(N) formula that works for ‖N‖ ≥ 1

Key advantages:

Works on matrices too large for full SVD
Exploits block diagonal dominance structure
Automatic optimal block size selection

References

Ogita, T. & Aishima, K. (2020), "Iterative refinement for singular value decomposition based on matrix multiplication"
Rump, S.M. & Ogita, S. (2024), "A Note on Oishi's Lower Bound for the Smallest Singular Value of Linearized Galerkin Equations"
Oishi, S. (2023), "Lower bounds for the smallest singular values of generalized asymptotic diagonal dominant matrices"