I grew up in the D.C. metro area, before attending the University of Minnesota where I double majored in chemical engineering and chemistry. As an undergraduate I worked as a process engineering intern at Polar Semiconductor Inc. and conducted research in synthetic organic chemistry in the Noland Research Group. It was during my junior year that I was first exposed to scientific computing in a numerical methods course and also became interested in nuclear energy as a way to mitigate climate change.

These interests lead me to join the Computational Nuclear Engineering Research Group at the University of Wisconsin, where I earned my Ph.D. in nuclear engineering. As a graduate student, I had coursework in advanced mathematics, numerical methods, algorithms, and high-performance computing. I gained nuclear engineering analysis experience performing calculations in support of the SHINE Medical Technologies medical isotope facility. My software development skills matured as a leading contributor to the Python for Nuclear Engineering Toolkit and as an intern at ORNL working on the ADVANTG code. My dissertation research focused on optimizing Monte Carlo simulations for biological dose rate calculations for nuclear fusion reactors by approximating the nuclear transmutation process.

2021 conference paper

2021 conference presentation

YouTube video of presentation

A new algorithm has been developed for simulating the process of neutrons scaterring off of nuclei, in a way that is conducive to GPU execution.

The Singular Value Decomposition (SVD) matrix factorization technique is shown to reduce RAM requirements for radiation transport simulation parameters by an order of magnitude, as demonstrated on the Summit supercomputing with a realistic nuclear reactor problem.

Singular Value Decomposition (SVD) is a matrix factorization technique that has applications in data compression and categorization problems. For an $m \times n$ matrix $\mathbf{A}$ the SVD is $$ \mathbf{A}_{m \times n} = \mathbf{U}_{m \times m} \mathbf{\Sigma}_{m \times n} \mathbf{V}_{n \times n}^T, $$ where $\Sigma$ is a diagonal matrix. In this diagonal matrix, the entries decrease monotonically. As a result, an estimate of $A$ can be formed by truncating the $\mathbf{U}$, $\mathbf{\Sigma}$, and $\mathbf{V}$ matrices. Specifically, an estimate of the $a_{ij}$ entry in the $\mathbf{A}$ matrix can be obtained via: $$ a_{ij} \approx \sum_{k=1}^K u_{ik} \sigma_{kk} v_{jk}, $$ with $1 < K < n$. In other words, an estimate of the $\mathbf{A}$ matrix — referred to as a "reconstruction" — can be obtained by storing only the first $K$ rows in $\mathbf{U}$, the first $K$ entries along the diagonal of $\mathbf{\Sigma}$, and the first $K$ columns in $\mathbf{V}$, which results in a factor of $\Gamma$ memory savings, where $\Gamma$ is given by $$ \Gamma = \frac{mn}{mK + K + nK}. $$ For example, consider the following $1333 \times 1000$ pixel image of a cave waterfall in Jackson County, AL: This image can be represented by a $1333 \times 1000$ matrix, where each entry represents a pixel with a gray-scale value between 0 and 1. Three reconstructions of this image are shown below. The $K=100$ reconstruction provides a fairly accurate representation of the image and requires $\Gamma=5.71$ times less memory.

Monte Carlo (MC) radiation transport calculations are used to assess the safety and performance of nuclear reactors and other nuclear systems. The MC method uses a random walk technique to simulate the paths particles (i.e., neutrons or photons), take as they move through a nuclear system, from birth (via fission or an external source) to termination (through absorption reactions or by leaving the spatial boundaries of the system). Particle paths are recorded as they move through regions of interest, known as tallies. By simulating a large number of particles, statistical methods can be used to estimate radiological quantities such as particle flux, biological dose rate, and reactions rates within these tallies.

One significant downside of MC radiation transport is that when tallies are located in regions of low particle flux (i.e., regions that few particles pass through), the convergence rates of tallies may be prohibitively slow. To address this problem most MC codes employ a class of biased random sampling techniques known as MC variance reduction (VR). VR method allow the code to drastically increase the fraction of compute time spent simulating particles that are likely to reach the tally, without introducing systematic bias into the tally results.

Industry-standard VR methods require a low-order estimate of the adjoint flux distribution, which describes the "importance" of position/energy phase space regions to the final result. This information is supplied via "importance maps": 3D spatial meshes with each mesh cell tagged with energy-wise adjoint fluxes. These important maps require a significant amount of RAM, which can limit the fidelity of the simulation that can be achieved.

This work explores the novel application of SVD to compress importance maps for reduced memory requirements. In the example above, the $\mathbf{A}$ matrix represents the two spatial dimensions of the waterfall image. Here, the 3 spatial dimensions of the importance maps are collapsed such that the 2 dimensions of the $\mathbf{A}$ matrix represent the spatial mesh cells and discrete energy groups of the adjoint flux. This allows an $A$ matrix to be formulated as: $$ \mathbf{A} = \begin{bmatrix} \phi^{\dagger \, g = 0}_{c = 0} & \phi^{\dagger \, g = 1}_{c = 0} & \dots & \phi^{\dagger \, g = N_g - 1}_{c = 0} \\[0.3cm] \phi^{\dagger \, g = 0}_{c = 1} & \phi^{\dagger \, g = 1}_{c = 1} & \dots & \phi^{\dagger \, g = N_g - 1}_{c = 1} \\[0.3cm] \vdots & \vdots & \ddots & \vdots \\[0.3cm] \phi^{\dagger \, g = 0}_{c = N_{c} - 1} & \phi^{\dagger \, g = 1}_{c = N_{c} - 1} & \dots & \phi^{\dagger \, g = N_g - 1}_{c = N_{c} - 1} \\[0.3cm] \end{bmatrix}, $$ where $\phi^{\dagger}$ denotes the adjoint flux, $g$ denotes the energy group index, and $c$ denotes the spatial mesh cell index. Since adjoint flux distributions may vary by 10 orders of magnitude or more, and SVD in log-space was also considered, i.e., $$ \mathbf{A} = \begin{bmatrix} \log\left(\phi^{\dagger \, g = 0}_{c = 0}\right) & \log\left(\phi^{\dagger \, g = 1}_{c = 0}\right) & \dots & \log\left(\phi^{\dagger \, g = N_g - 1}_{c = 0}\right) \\[0.3cm] \log\left(\phi^{\dagger \, g = 0}_{c = 1}\right) & \log\left(\phi^{\dagger \, g = 1}_{c = 1}\right) & \dots & \log\left(\phi^{\dagger \, g = N_g - 1}_{c = 1}\right) \\[0.3cm] \vdots & \vdots & \ddots & \vdots \\[0.3cm] \log\left(\phi^{\dagger \, g = 0}_{c = N_{c} - 1}\right) & \log\left(\phi^{\dagger \, g = 1}_{c = N_{c} - 1}\right) & \dots & \log\left(\phi^{\dagger \, g = N_g - 1}_{c = N_{c} - 1}\right) \\[0.3cm] \end{bmatrix}. $$ By storing the truncated SVDs of these $\mathbf{A}$ matricies, rather than the full matrices themselves, a signicant amount of RAM can be saved. Values of the adjoint flux can then be reconstructed on-the-fly (i.e., during a simulation). Both formulations were implemented and performance tested in the Shift Monte Carlo radiation transport code.

The Westinghouse AP1000 is a Generation III+ nuclear reactor currently being
constructed at several sites in in the US and China. SVD compression of
importance maps was tested for the "excore dosimetry" problem shown below.
This problem consists of a quarter-core model with reflecting boundaries, with
a spherical neutron detector in the excore region (i.e., the region outside the
reactor core), on the $z$-midplane. The image on the left shows a $z$-midplane
slice of this geometry, with the detector shown in green. The image on the
right shows the neutron source distribution from fisison. The objective of the
problem is to transport particles from the fission source, through the
sheilding (shown in burgundy), to the detector, which serves as a tally.
The Denovo 3D discrete ordinates (S$_N$) code was used to obtain an adjoint flux
distribution (importance map) on a 44 $\times$ 44 $\times$ 49 spatial mesh with 28
energy groups. The contour plot below shows the adjoint flux for the
1.4227–1.8268 MeV energy group (i.e., fission neutrons). This plot shows that
fission neutrons experience more than 6 orders of magnitude of attenuation
between the source and detector, which indicates that VR will be necessary.
Because 28 energy groups were used, the $\mathbf{A}$ matrix has an $n$ of 28.
Shift was run using $K = [1, 28]$ with the SVD performed in both linear and log
space. Ten independent trials were run for each $K$. Each trial was run on the
Summit supercomputer on 264
nodes with 7 processes per node (due to memory constraints). The plot below
shows the speedups and corresponding standard deviations for these trials, where
the speedup is defined as the ratio of the convergence rate with VR, to the
convergence rate without VR.
This plot shows that the two best options are 1) use a rank $K=1$ approximation
with linear SVD, which provides a $\Gamma \approx 28*$ while incurring a
$\sim3\times$ performance penalty, or 2) use a $K=2$ approximation with a log
SVD, which provides a $\Gamma \approx 14$ with no performance penalty.
The results show that taking the SVD of the adjoint flux is an effective
method for reducing the memory requirements of importance maps without a
significant sacrifice in performance.

*
* Note for the nuclear engineering audience: this work using the CADIS method and these $\,\Gamma$ values refer only to weight windows. For the $K=1$ SVD in linear space, the biased source can also be compressed by a factor of 28, but for the $K=2$ SVD in log space, the biased source cannot be compressed. These detailed were omitted for brevity. A full explanation is found in the journal article .
*

Dissertation

2017 journal article

Stocastic radiation transport simulations are optimized for an important nuclear fusion analysis scenario in which neutrons generate radioisotopes in structural materials, resulting in a prolonged source of potentially harmful gamma radiation.

One of the challenges of designing and operating nuclear fusion reactors (such
as ITER) is the creation of radioactive
materials. For most viable nuclear fusion fuels, the process of fusing atomic
nuclei releases high-energy neutrons. These neutrons can penetrate through the
structural materials of the reactor and convert normally harmless materials
such as the iron in stainless steel or the hydrogen in water into radioactive
isotopes. Even after the reactor is switched off, these isotopes will persist
and emit high-energy photons (gamma rays) as they decay.
Some of these photons travel to areas where people may be taking measurements
or doing maintenance, so it is necessary to estimate the potential biological
dose rate. This is generally done using Monte Carlo radiation transport, which
involves simulating the paths that particles take as they move through the
reactor and interact with the materials.

From the standpoint of computer processing power, these calculations are not
possible without employing a class of techniques known as Monte Carlo variance
reduction. Variance reduction techniques are used to optimize Monte Carlo
transport by first finding an approximate solution to an adjoint formulation of
the problem (though not all techniques are explicitly formulated this way).
This is done with respect to a detector response function. In terms of a
standard optimization problem, the variance of the detector response function
can be thought of as an objective function that should be minimized. For decay
photons, defining this detector response function is straightforward, as I
demonstrated in earlier work [2016 journal
article]. For the neutrons that produce the decay photons, this process is
much more complicated.

As a component of my dissertation work I developed and implemented a technique
for automatically generating the neutron detector response function in order to
optimize Monte Carlo radiation transport for the aforementioned neutron problem.
The most difficult part of forming this function was approximating the process of "neutron activation"
in which material exposed to neutrons creates a vast network of radioactive
products. For example when $^{62}\mathrm{Ni}$ is exposed to neutrons, the following reactions, and many more, occur:
$$
{^{62}\mathrm{Ni}}
\xrightarrow{\textrm{(n, $\alpha$)}}
{^{59}\mathrm{Fe}}
\xrightarrow{\beta^-}
{^{59}\mathrm{Co}}
\begin{array}{ll}
\xrightarrow{\textrm{(n, $\gamma$)}}
{^{60}\mathrm{Co}} \\
\xrightarrow{\textrm{(n, $\gamma$)}}
{^{60\mathrm{m}}\mathrm{Co}}
\xrightarrow{\mathrm{IT}}
{^{60}\mathrm{Co}}
\end{array}
$$
In general, composition of materials before and after nuclear irradiation can be
related via the matrix exponential equation:
$$ \vec{N}(t) = e^{\mathbf{A}t} \vec{N}_0,$$
where $\vec{N}_0$ and $\vec{N}(t)$ are vectors describing the composition of a
material before and after neutron irradiation for time $t$. The $\mathbf{A}$ is
a matrix of reaction rates that depend on the rate in which neutrons pass
through a given area (the neutron flux). I developed a set of approximations
that can be used to simplify this matrix exponential such that it becomes a
linear operator with respect to neutron flux. The details of this process can
be found in my dissertation/journal article linked at the top of this page, but
just for fun I will show the final answer for the optimal neutron detector
response function ($q^+_n$) that results from the approximation of
$\mathbf{A}$:
$$
\begin{align}
q^+_n(E_n, E_p) = \int_{E_p} \text{d}{E_p} \phi^+_p(E_p)
\sum\limits_{c} \lambda_{c, i_c} b_{c, i_c}(E_p) N_{c,1}(0) \Bigg\lbrack \Big\lbrack &\sigma_{c, 1 \rightarrow 2}(E_n) \frac{t_{\text{irr}}^{i_c-1}}{(i_c-1)!} \prod\limits_{j=3}^{i_c} P_{c, j} \Big\rbrack e^{-d_{c, i_c} t_{\text{decay}}}\\
& + \sum_{j=2}^{i_c-1} \Big\lbrack \sigma_{c, 1 \rightarrow 2}(E_n) \frac{t_{\text{irr}}^{j-1}}{(j-1)!} \prod\limits_{k=3}^{j} P_{c, k} \Big\rbrack B_{c,i_c,j}(t_{\text{decay}}) \Bigg\rbrack.
\end{align}
$$

Here I'll show that my method, which is called Groupwise Transmutation Consistent Adjoint Driven Importance Sampling (GT-CADIS) improves the convergence rate of Monte Carlo neutron transport by a factor of $O(10^2)$ relative to the industry-standard method known as Forward-Weighted (FW)-CADIS. Consider a simple problem shown below consisting of a block of stainless steel (gray) with a U-shaped air-duct, a neutron source (blue) and a region (orange) where we'd like to estimate the biological dose rate from photons after neutron irradiation. Using FW-CADIS, neutron detector response function is shown below. Recall that the variance of this function can be thought of as the guess for the objective function. With GT-CADIS, the neutron detector response function has a much different shape: the intensity is much higher in the regions of stainless steel that border the air duct. Note that only the shape is important here, not the magnitude (or even units!). I generated variance reduction parameters using both detector response functions and tracked the rate of convergence of Monte Carlo simulations using each set. The image below shows the relative error in the neutron flux as a function of neutron transport processor time using GT-CADIS and FW-CADIS variance reduction parameters, as well as no variance reduction. Without variance reduction, the neutron flux converges fastest near the source. With FW-CADIS the neutron flux converges at a roughly uniform rate throughout the problem. With GT-CADIS, the neutron flux converges fastest in the area immediately surrounding the area of interest, as expected. In order to quantify how much faster the simulation converges with GT-CADIS, the Monte Carlo figure of merit was used, as defined by $$ {\text{figure of merit}} = \frac{1}{R^2t_{\text{proc}}}, $$ where R is the relative error of the photon flux in the region of interest (e.g. the orange sphere in the geometry image) and $t_{\text{proc}}$ is the neutron transport processor time. The figure of merit converges to a constant value during a simulation, as seen in the plot below. The ratio of figure of merits is the speedup. GT-CADIS was found to provide speedups of $200 \pm 100$ relative to FW-CADIS and $9 \pm 5 \times 10^5$ relative to simulations without variance reduction. In other words, GT-CADIS drastically reduces the required computational resources necessary for calculating these photon dose rates.

C++ source

Report

In grad school, one of my favorite courses was ME759: High-Performance Computing
for Applications in Engineering. The primary focus of this course was GPU
computing with CUDA, though we spent some time with MPI and OpenMP as well.
For my final project, I wrote a program to solve a system of linear equations
on the GPU. The problem I solved takes the form
$$ A_{n \times n} x_{n \times m} = b_{n \times m} $$
where $A$ is a diagonally-dominant banded matrix with some bandwidth $k$.
Problems of this form may arise when using numerical methods to solve ordinary
and partial differential equations. This equation can be solved in 3 steps: LU
decomposition, forward substitution, and backward substitution. For brevity,
I'll only discuss the first step here. More details can be found in the links at
the top of this page. LU decomposition decomposes the $A$ matrix into lower-
and upper-triangular matrices $L$ and $U$:
$$ A = LU. $$
The algorithm I chose for performing this decomposition is shown below.

```
```1 *for* i = 0 to n−2
2
3 *for* j = i+1 to n−1
4 A[j, i] = A[j, i]/A[i,i]
5 *end for*
6
7 *for* j = i+1 to n−1
8 *for* k = i+1 to n−1
9 A[j, k] = A[j , k] − A[j, i]∗A[i, k]
10 *end for*
11 *end for*
12
13 *end for*

This algorithm operates on the matrix in-place, such that resultant $L$ and $U$
matrices are produced within the lower and upper triangular portion of the
original $A$ matrix, which helps to minimize memory usage.

I adapted this algorithm to run efficiently on the GPU using CUDA. This
algorithm requires synchronization after the for-loop on line 3 and the set of
for-loops beginning on line 7. Though CUDA can impose synchronization between
threads within a block, there is no notion of block synchronization. As a
result, the outermost for-loop must occur on the CPU, and this for-loop must
call 2 separate CUDA kernels: one for the line 3 for-loop and one for the line
7 for-loops. The line 7 for-loops offered the best potential for
parallelization. No synchronization is required over $j$ or $k$, so large
portions of the $A$ matrix can be operated on by different threads
simultaneously. This is illustrated in the following diagram. The rectangles
represent the active region of the matrix for a given kernel call where the
order of kernel calls goes from light to dark. Within each rectangle,
execution is divided into blocks, and threads within these blocks perform the
operation appearing on line 9 simultaneously.
Further memory usage reductions were achieved by using the band storage
technique. This technique only stores the values within the band of the matrix. As a result, the active region of the matrix in my implementation proceeds according to the following diagram.
The aforementioned forward and backward substitution steps were also
implemented. The performance of the program was compared to the Intel Math Kernel Library (MKL)
on the CPU. The following plot shows the execution time necessarily to solve a
linear system with an $A$ matrix of size $n$, with bandwidths $k$ that are 10%,
50% and 90% of $n$. The results of this plot are for the case where $m=1$, but
scaling analysis was done over $m$ as well.
This plot shows that my CUDA implementation is marginally faster than MKL over a range of matrix dimensions and bandwidths.

- E. Biondo, T. Evans, G. Davidson, S. Hamilton,
“Singular Value Decomposition of Adjoint Flux Distributions for
Monte Carlo Variance Reduction,”
*Annals of Nuclear Energy*, Vol. 141, 2020. [link] - E. Biondo, G. Davidson, T. Pandya, S. Hamilton, T. Evans,
“Deterministically Estimated Fission Source Distributions for Monte Carlo
k-Eigenvalue Problems,”
*Annals of Nuclear Energy*, Vol. 119, pp. 7-22, 2018. [link] - E. Biondo, P. Wilson, “Transmutation Approximations for the
Application of Hybrid Monte Carlo/Deterministic Neutron Transport to Shutdown
Dose Rate Analysis,”
*Nuclear Science and Engineering*, Vol. 187, Issue 1, pp. 27-48, 2017. [link] - E. Biondo, A. Davis, P. Wilson, “Shutdown Dose Rate Analysis with
CAD Geometry, Cartesian/Tetrahedral Mesh, and Advanced Variance Reduction,”
*Fusion Engineering and Design*, Vol. 106, pp. 77–84, 2016. [link] - E. Biondo, P. Wilson, “Application of the Multi-Step CADIS Method
to Fusion Energy Systems Analysis,”
*International Conference on Mathematics & Computational Methods Applied to Nuclear Science & Engineering*, Jeju, South Korea, 2017.