Certified Matrix Decompositions

BallArithmetic.jl provides rigorous certification for several matrix decompositions. Each method computes both a numerical approximation and a rigorous error bound.

Overview

DecompositionFunctionDescription
SVDogita_svd_refineSingular value decomposition
SVD Cascadeogita_svd_cascadeMulti-precision SVD cascade
SVD (GLA)ogita_svd_cascade_glaNative BigFloat SVD
Schuriterative_schur_refinementSchur form
LUverified_luLU factorization
Choleskyverified_choleskyCholesky factorization
QRverified_qrQR factorization
Polarverified_polarPolar decomposition
Takagiverified_takagiTakagi factorization

BallArithmetic supports several extension packages that provide different performance characteristics:

ExtensionPackageUse Case
GenericLinearAlgebraGenericLinearAlgebra.jlRecommended - Native BigFloat SVD, fastest and most accurate
MultiFloatsMultiFloats.jlSIMD-accelerated multi-precision (Float64x2, Float64x4)
DoubleFloatsDoubleFloats.jlFast Double64 arithmetic (~106 bits)

SVD Certification Methods

The SVD is fundamental for resolvent norm certification and condition number estimation. We provide multiple methods with different speed/accuracy trade-offs.

Method Comparison

All benchmarks performed on:

  • CPU: AMD Ryzen 5 5600 6-Core Processor
  • RAM: 128 GB
  • Julia: 1.11+
  • BigFloat precision: 256 bits

Small Matrix (100×100)

MethodTimeNon-rigorous ResidualRigorous σ_min BoundSpeedup
GLA (no refine)4.2s4×10⁻⁷⁴6.6×
D64 cascade (F64→D64→BF)11.7s1×10⁻¹²2.4×
Minimal cascade (F64→MF3→BF)12.2s1×10⁻¹²2.3×
Full cascade (F64→D64→MF3→MF4→BF)15.0s1×10⁻¹³1.8×
Pure Ogita (F64→BF×5)27.7s1×10⁻¹²1.0×

Key observations:

  • GLA achieves 60+ orders of magnitude better residual than cascade methods
  • GLA is 6.6× faster than pure Ogita refinement
  • All methods produce valid rigorous bounds for σ_min
  • Simpler cascades (D64 only) outperform complex multi-level cascades

Medium Matrix (200×200)

MethodTimeNon-rigorous ResidualRigorous BoundSpeedup
Cascade (F64→D64→MF3→MF4→BF×2)196s3.3×10⁻¹²2.1×
Pure Ogita (F64→BF×5)407s5.0×10⁻¹²1.0×

Large Matrix (500×500)

MethodTimeNon-rigorous ResidualRigorous BoundSpeedup
Cascade (F64→D64→MF3→MF4→BF×2)3234s1.4×10⁻¹¹1.9×
Pure Ogita (F64→BF×5)6279s1.6×10⁻¹¹1.0×

Recommendations

  1. Use GenericLinearAlgebra when available:

    using BallArithmetic, GenericLinearAlgebra
    
    T_bf = Complex{BigFloat}.(T)
    z_bf = Complex{BigFloat}(z)
    result = ogita_svd_cascade_gla(T_bf, z_bf)
    
    # result.σ_min is a rigorous lower bound
    # result.residual_norm is the non-rigorous residual
  2. Without GLA, use the simple D64 cascade:

    using BallArithmetic, MultiFloats, DoubleFloats
    
    result = ogita_svd_cascade(T_bf, z_bf;
        f64_iters=1, d64_iters=1, mf3_iters=0, mf4_iters=0, bf_iters=2)
  3. For pure BigFloat (no extensions):

    using BallArithmetic
    
    result = ogita_svd_refine(A, U, Σ, V; max_iterations=5)

Rigorous vs Non-Rigorous Results

Each decomposition method returns:

  • Non-rigorous residual: The computed ‖A - UΣVᴴ‖_F using standard floating-point
  • Rigorous bound: A mathematically guaranteed upper bound on the true error

The rigorous bound accounts for:

  • Rounding errors in all computations
  • Potential instabilities in the algorithm
  • Accumulated error from multiple operations

For certification purposes, always use the rigorous bound.

Other Matrix Decompositions (100×100 Matrix)

Beyond SVD, BallArithmetic provides verified versions of standard matrix decompositions.

Standard (Float64 → BigFloat)

These use Float64 computation with BigFloat certification:

DecompositionTimeResidual NormRigorous Bound
Cholesky (SPD)1.7s2.7×10⁻¹⁶
QR2.1s3.1×10⁻¹⁵
Polar3.3s4.8×10⁻¹⁶
LU5.6s1.3×10⁻¹⁴
Takagi10.3s1.7×10⁻¹⁵

GLA-Based (Native BigFloat)

With GenericLinearAlgebra, we achieve 60+ orders of magnitude better residuals:

DecompositionTimeResidual NormImprovement
Cholesky (GLA)0.55s5.2×10⁻⁷⁴10⁵⁸× better
LU (GLA)0.91s5.9×10⁻⁷⁵10⁶¹× better
QR (GLA)0.91s5.4×10⁻⁷⁵10⁶⁰× better
SVD (GLA)5.15s4.5×10⁻⁷⁴10⁶⁰× better

GLA functions:

using BallArithmetic, GenericLinearAlgebra

result_lu = verified_lu_gla(A)        # ~10⁻⁷⁵ residual
result_qr = verified_qr_gla(A)        # ~10⁻⁷⁵ residual
result_chol = verified_cholesky_gla(A_spd)  # ~10⁻⁷⁴ residual
(U, S, V, res) = verified_svd_gla(A)  # ~10⁻⁷⁴ residual

Usage examples:

using BallArithmetic
using LinearAlgebra

A = randn(100, 100) + 5I

# LU decomposition
result_lu = verified_lu(A)
# result_lu.L, result_lu.U are BallMatrix enclosures
# result_lu.residual_norm is the rigorous error bound

# QR decomposition
result_qr = verified_qr(A)
# result_qr.Q, result_qr.R are BallMatrix enclosures
# result_qr.orthogonality_defect bounds ‖QᴴQ - I‖

# Cholesky (for symmetric positive definite)
A_spd = A'A + 100I
result_chol = verified_cholesky(A_spd)
# result_chol.L is the lower Cholesky factor

# Polar decomposition A = UH
result_polar = verified_polar(A)
# result_polar.U (unitary), result_polar.H (Hermitian positive semidefinite)

# Takagi factorization (for complex symmetric)
A_symm = A + transpose(A)
result_takagi = verified_takagi(A_symm)
# A = U Σ Uᵀ (not Uᴴ!)

Iterative Refinement Methods

For decompositions not supported by GLA, we provide iterative refinement using extended precision arithmetic.

Float64 Iterative Refinement (5 iterations)

DecompositionTimeResidualOrthogonality Defect
Cholesky0.97s8.9×10⁻¹⁷N/A
QR (CholQR2)0.12s6.8×10⁻¹⁶2.8×10⁻¹⁵
LU0.96s1.5×10⁻¹⁵N/A
Polar (Newton-Schulz)0.22s1.4×10⁻¹⁵2.1×10⁻¹⁵

Double64 Iterative Refinement (5 iterations)

Double64 provides ~106 bits of precision, dramatically improving orthogonality:

DecompositionTimeResidualOrthogonality Defect
Cholesky0.90s7.2×10⁻¹⁸N/A
QR (CholQR2)1.69s7.1×10⁻¹⁶2.1×10⁻³¹
LU4.53s1.2×10⁻¹⁵N/A
Polar (Newton-Schulz)2.76s1.3×10⁻¹⁵2.0×10⁻³¹

Key observation: Double64 achieves 16 orders of magnitude better orthogonality for QR and Polar decompositions, while residuals remain similar (limited by Float64 input precision).

Usage:

using BallArithmetic, DoubleFloats, LinearAlgebra

A = randn(100, 100) + 5I
F_qr = qr(A)

# Float64 refinement
result_f64 = refine_qr_cholqr2(A, Matrix(F_qr.Q), Matrix(F_qr.R); max_iterations=5)

# Double64 refinement (better orthogonality)
result_d64 = refine_qr_double64(A, Matrix(F_qr.Q), Matrix(F_qr.R);
    method=:cholqr2, max_iterations=5)

# Polar refinement
F_svd = svd(A)
Q0 = F_svd.U * F_svd.Vt
result_polar = refine_polar_double64(A, Q0; method=:newton_schulz, max_iterations=10)

Schur Complement Bounds (Oishi 2023 / Rump-Oishi 2024)

For block-structured matrices, the Schur complement method provides efficient σ_min bounds:

result = rump_oishi_2024_sigma_min_bound(G; block_size=m)

This implements:

  • Oishi 2023: Base Schur complement algorithm
  • Rump-Oishi 2024: Improved ψ(N) formula that works for ‖N‖ ≥ 1

Key advantages:

  • Works on matrices too large for full SVD
  • Exploits block diagonal dominance structure
  • Automatic optimal block size selection

References

  • Ogita, T. & Aishima, K. (2020), "Iterative refinement for singular value decomposition based on matrix multiplication"
  • Rump, S.M. & Ogita, S. (2024), "A Note on Oishi's Lower Bound for the Smallest Singular Value of Linearized Galerkin Equations"
  • Oishi, S. (2023), "Lower bounds for the smallest singular values of generalized asymptotic diagonal dominant matrices"