Some initialization stuff... Note: compiled with ipython commit ff7fc6, 1/23/14

In [29]:
from matplotlib import pyplot as plt
%matplotlib inline
import sys
sys.path.append('/home/luke/Research/Python/ctst/')
from tomography import * 

$$\newcommand{\norm}[1]{\lVert#1\rVert} \newcommand{\Rbb}[0]{\mathbb{R}} \newcommand{\Ht}[1]{\mathcal{T}_{\gamma}\left(#1\right)} \DeclareMathOperator*{\prox}{prox} \DeclareMathOperator{\Poi}{Poi} \DeclareMathOperator{\Ct}{\mathcal{\tilde{C}}} \DeclareMathOperator{\suchthat}{such\, that} \DeclareMathOperator{\st}{s.t.} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \require{color} \newcommand{\blue}[1]{ {\color{blue}{#1}}} \newcommand{\abs}[1]{\left| #1 \right|} $$

Start of slides

Model-Based Iterative Tomographic Reconstruction
with Adaptive Sparsifying Transforms

Luke Pfister & Yoram Bresler

University of Illinois at Urbana-Champaign

Low-Dose Computed Tomography

Model-Based Reconstruction

Three Ingredients

  • System Model
  • Noise Model
  • Signal Model --- Our Focus

Tie together into an optimization problem

Penalized Weighted Least-Squares

$$\min_x \frac{1}{2} \norm{y - Ax}_W^2 + \lambda J(x)$$ $$\min_x \frac{1}{2} \norm{\blue{y - Ax}}_W^2 + \lambda J(x) $$ $$\min_x \frac{1}{2} \norm{y - Ax}^2_{\blue{W}} + \lambda J(x) $$
$$\min_x \frac{1}{2} \norm{y - Ax}_{W}^2 + \lambda \blue{J(x)} $$



  • System Model
    • $y\in\Rbb^M \ $: Log of CT data

    • $A: \ \ \ \ \ \ \ \ \ \ \ $ System Matrix

    • $x\in\Rbb^N: \ $ Image Estimate
  • Noise Model
    • $W = \mathrm{diag}\left\{w_i\right\}:\ \ \ $ Statistical weights
  • Signal Model
    • ${\small J(x):\Rbb^N\to\Rbb}: \ $ Regularizer

    Our Contributions

    • Propose the use of adaptive sparsifying transform regularization

    • Use the SALSA algorithm to solve a constrained weighted least squares problem

      • Removes need to tune $\lambda$!

    Back to PWLS

    $$\min_x \frac{1}{2} \norm{y - Ax}_W^2 + \lambda J(x)$$

    • Problem: how should we choose $\lambda$?
      • Generalized Cross Validation, Discrepency Principal, SURE, ...
      • Sweep over values and choose the 'best' reconstruction
      • Alternative approach: eliminate $\lambda$

    Constrained Least Squares (CLS) [Niu & Liu, 2012]

    $$ \begin{align*} \min_x & J(x) \\ \st &\norm{y - Ax}_2^2 < \epsilon \end{align*} $$

    • $\epsilon$ chosen according to Poisson noise model
    • Solve using a gradient projection method
    • Drawback: identical weight for all projections

    Constrained Weighted Least Squares (CWLS) [Choi, 2010]

    $$ \begin{align*} \min_x & J(x) \\ \st &\norm{y - Ax}_{\blue{W}}^2 < \epsilon \end{align*} $$

    • Tighter integration of the Poisson noise model
    • Per-projection weighting
    • For convex $J(x)$, equivalent to PWLS for a particular choice of $\lambda$

    Signal Models

    Signal Models

    • Many successful models exploit sparsity
      • TV, qGGMRF, Wavelets + $\ell_1$, ...
      • Better model $\implies$ better reconstruction
      • Data-adaptive sparse representations- sparse signal models adapted for a particular signal instance
        • Usually patch-based

    Patch-based Signal Models

    Sparse Signal Models

  • Synthesis sparsity
  • Transform sparsity
  • Synthesis Sparsity

    • $x = D a, \ \ a$ is sparse
    • Dictionary learning: Given $\left\{x_j\right\}_{j=1}^P$, find $D$ & $\left\{a_j\right\}_{j=1}^P$ - Applied to low-dose and few-view CT - Scales poorly with data size

    Analysis Sparsity

    • $x = q + e$, $\ \Omega q$ is sparse
    • Analysis Operator Learning: Given $x$, find $q$ and $\Omega$
      • Also scales poorly with data size Have example scaling numbers ready
      • $e$ captures perturbation in signal domain

    Signal Models

    "Noisy" Analysis Sparsity

    • $x = q + e$
    • $\Omega q$ is sparse
    • $e$ is small
    • Sparse coding remains NP-Hard

    Transform Sparsity

    • $\Phi x = z + e$, $\ \ z$ is sparse
    • $\Phi$: Sparsifying transform
    • $e$ captures deviation from model in transform domain
      • Allows for fast algorithms!
  • Transform learning: Given $\left\{x_j\right\}_{j=1}^P$, find $\Phi$ and $\left\{z_j\right\}_{j=1}^P$
      - Scales more gracefully with data size
  • Problem Formulation

    Regularization with sparsifying transforms



    $$ {\small \begin{align*} J(x) &= \min_{z, \Phi} \sum_j \norm{\Phi E_j x - z_j}_2^2 + \gamma \norm{z_j}_1 + \alpha( \norm{\Phi}_F^2 - \log{\det \Phi}) \end{align*} } $$

    • $E_j$: Extracts $j$-th $8 \times 8$ patch and removes mean
    • First term: Penalizes the distance between transformed patch and sparse code
    • Second term: encourage sparsity
    • Third term: encourage full-rank and well-conditioned $\Phi$

    Reconstruction Problem

    $$ {\small \begin{align*} \min_x &\left\{ \min_{z, \Phi} \sum_j \norm{\Phi E_j x - z_j}_2^2 + \gamma \norm{z_j}_1 + \alpha( \norm{\Phi}_F^2 - \log{\det \Phi}) \right\} \\ \st &\norm{y - Ax}_W^2 \leq \epsilon \end{align*} } $$

    Algorithm

    Regularizer Update

    $\Phi$ update

    $$ \begin{align*} \Phi^{k+1} &= \argmin_{\Phi} \sum_j \norm{\Phi E_j x - z_j}_2^2 + \alpha( \norm{\Phi}_F^2 - \log{\det \Phi}) \end{align*} $$

    • Closed form solution! [Ravishankar, 2012]
    • Requires three matrix products of size $k \times N$ by $N \times k$, and one SVD of a $k \times k$ matrix

    Regularizer Update

    $z_j$ update

    $$ \begin{align*} z_j^{k+1} &= \argmin_{z_j} \norm{\Phi E_j x - z_j}_2^2 + \gamma \norm{z_j}_1 \end{align*} $$

  • Closed form solution using soft thresholding : $z_j^{k+1} = \Ht{\Phi E_j x}$

    $$ \begin{align*} \mathcal{T}_\lambda(a) = \begin{cases} 0, \quad & \abs{a} \leq \lambda \\ \left(1 - \frac{\lambda}{\abs{a}} \right) a, \quad &\text{otherwise.} \end{cases} \end{align*} $$

  • Image Update


    $$ \begin{align*} \min_x \ & \sum_j \norm{\Phi E_j x - z_j}_2^2 \\ \st &\norm{y - Ax}_W^2 \leq \epsilon \end{align*} $$

    • Constrained least-squares problem in $x$
    • Approach: use the Split Augmented Lagrangian Shrinkage Algorithm (SALSA) to modify into a sequence of tractable optimization problems
    • Introduce auxiliary variables $v, \eta \in \Rbb^M$

    Image Update

    • Rewrite problem:

    $$ \begin{align*} \min_{x, v} \ & \sum_j \norm{\Phi E_j x - z_j}_2^2\\ \st &\norm{y - v}_W^2 \leq \epsilon\\ &v = Ax \end{align*} $$

    Image Update

    • Define indicator for constraint set:

    $$ \begin{align*} I_{\mathcal{C}}(v) = \begin{cases} 0, \quad & \norm{v - y}_W^2 \leq \epsilon \\ \infty, \quad & \text{otherwise. } \end{cases} \end{align*} $$

    • Introduce auxilary variables $v, \eta \in \Rbb^M$
    • Write an equivalent problem:

    $$ \begin{align*} &\min_x J(x) + I_{C}(v) \\ &\st \ v = Ax \end{align*} $$

    Solution using ADMM

    • Augmented Lagrangian:

    $$ {\small \begin{align*} \mathcal{L}(x, v, \eta) = &\sum_j \norm{\Phi E_j x - z_j}_2^2 + \frac{\mu}{2} \norm{v - Ax - \eta}_2^2 - \frac{\mu}{2} \norm{\eta}_2^2 \\ & \st \norm{v - y}_W^2 \leq \epsilon \end{align*} } $$

    • Alternate between
      • Minimization over $x$
      • Minimization over $v$
      • Maximization over $\eta$
    • At the $k$th iteration, we must solve:

    $$ \begin{align*} x^{k+1} &= \argmin_x \sum_j \norm{\Phi E_j x - z_j}_2^2 + \frac{\mu}{2}\norm{v^{k} - \eta^{k} - Ax}_2^2 \\ v^{k+1} &= \argmin_v I_\mathcal{C}(v) + \frac{\mu}{2}\norm{v - (\eta^k + Ax^{k+1})}_2^2 \phantom{\sum_j}\\ \eta^{k+1} &= \eta^{k} - v^{k+1} + Ax^{k+1} \end{align*} $$

    $x$-update

    • Solve: $$ {\small \left(\mu A^* A + \sum_j E_j^* \Phi^* \Phi E_j \right) x^{k+1} = \mu A^* (v^k - \eta^k ) + \sum_j E_j^* \Phi^* z_j } $$

    • Linear least squares in $x$

    • Hessian $H = \mu A^* A + \sum_j E_j^* \Phi^* \Phi E_j$ is approximately shift-invariant
    • Solve using Preconditioned Conjugate-Gradient (PCG) with circulant preconditioner

    $v$-update

    $$ \begin{align*} v^{k+1} = \argmin_v &\norm{v - (\eta^k + Ax^{k+1})}_2^2 \\ \st &\norm{v - y}_W^2 \leq \epsilon \end{align*} $$


    • Projection of $\eta^k + Ax^{k+1}$ onto ellipsoidal constraint set
    • Reformulate into quadratic problem with unweighted norm and solve using the FISTA proximal gradient algorithm
    • Cheap!
        Each iteration requires multiplication by $W^{-\frac{1}{2}}$ and calculating an $\ell_2$ norm
    • Coverges in 5-10 iterations

    $v$-update

    • Define for unweighted norm ball:

    $$ \begin{align*} I_{\mathcal{\Ct}}(v) = \begin{cases} 0, \quad & \norm{v}_2^2 \leq \epsilon \\ \infty, \quad & \text{otherwise } \end{cases} \end{align*} $$

    • Introduce $\zeta = W^{\frac{1}{2}}(v - y)$
    • Rewrite $v$-update as $$ \begin{equation*} \min_\zeta I_{\Ct}(\zeta) + \frac{\mu}{2} \norm{W^{-\frac{1}{2}} \zeta - ( \eta^k + A x^{k+1} - y)}_2^2 \end{equation*} $$

    • Solve using FISTA proximal gradient algorithm

    • Cheap; requires only multiplication by $W^{-\frac{1}{2}}$ and calculating an $\ell_2$ norm
    • Coverges in 5-10 iterations

    $\eta$-update

    • Gradient step

    $$ \eta^{k+1} = \eta^{k} - v^{k+1} + Ax^{k+1} $$

    Overall Algorithm

    • Regularizer Update
      • Update $\Phi$
      • Update $z_j$
    • Image Update
      • Update $x$
      • Update $v$
      • Update $\eta$
    • Repeat

    Convergence

    Experiments

    Experiments

    • Analytic projections from FORBILD head phantom

      • $350 \times 350$ pixels
      • $25.6$ cm FOV
    • GE Lightspeed fan-beam geometry

      • $888$ detectors
      • $984$ projections over $360^\circ$
      • Photon flux: $1.75 \times 10^6$
    • Intialize $\Phi$ to be separable 2D finite-differencing matrix

    Experiments

    Performance of formulation

    • Compare CLS & CWLS
    • Use sparsifying transform regularizer
    • 10 outer loop iterations followed by 150 image update iterations

    $$ \begin{align*} \tag{CLS} \min_x & J(x) \ \st \ \norm{y - Ax}_2^2 < \epsilon \\ \tag {CWLS} \min_x & J(x) \ \st \ \norm{y - Ax}_{\blue{W}}^2 < \epsilon \end{align*} $$

    Experiments

    Performance of regularizers

    • Test multiple regularizers using the CWLS framework
      • Total-variation (TV)
        • $J(x) = \norm{x}_{TV}$
        • Apply variable splitting to regularizer
      • Dictionary learning (DL):
        • $J(x) = \min_{D, a_j} \sum_j \norm{ E_j x - D a_j}_2^2 + \gamma \norm{z_j}_0$
        • Solve using orthogonal matching pursuit & K-SVD

    Experiments

    Time for one outer-loop iteration (seconds)

    $D/\Phi \text{ Update }$ $a/z \text{ Update }$ $\text{Image Update }$ $\text{Total}$
    $\text{TV}$ $0$ $0$ $257.7$ $257.7$
    $\text{DL}$ $54.5$ $20.3$ $249.1$ $323.9$
    $\text{AST}$ $2.1$ $0.1$ $254.4$ $256.6$

    Conclusions

    • Constrained weighted least-squares for model-based reconstruction
      • Eliminates the penalty parameter $\lambda$
      • Better performance than constrained least-squares for low-dose reconstructions


    • Adaptive sparsifying transform regularization
      • Outperforms synthesis dictionary learning regularization at the speed of total-variation regularization

    Thanks!

    Constrained Weighted Least Squares (CWLS)

    Choosing $\epsilon$

    • Quadratic approximation to negative log likelihood:

    $$ \sqrt{w_k} (y_k - [Ax]_k) \sim N(0, 1) $$

    $$ \implies \norm{y - Ax}_W^2 \sim \chi_M^2 $$

    • Mean = $M$, variance = $2M$
    • Set $\blue{\ \ \epsilon = c \cdot (M + 2 \sqrt{2M})}$