I grew up in the D.C. metro area, before attending the University of Minnesota where I double majored in chemical engineering and chemistry. As an undergraduate I worked as a process engineering intern at Polar Semiconductor Inc. and conducted research in synthetic organic chemistry in the Noland Research Group. It was during my junior year that I was first exposed to scientific computing in a numerical methods course and also became interested in nuclear energy as a way to mitigate climate change.

These interests lead me to join the Computational Nuclear Engineering Research Group at the University of Wisconsin, where I earned my Ph.D. in nuclear engineering. As a graduate student, I had coursework in advanced mathematics, numerical methods, algorithms, and high-performance computing. I gained nuclear engineering analysis experience performing calculations in support of the SHINE Medical Technologies medical isotope facility. My software development skills matured as a leading contributor to the Python for Nuclear Engineering Toolkit and as an intern at ORNL working on the ADVANTG code. My dissertation research focused on optimizing Monte Carlo simulations for biological dose rate calculations for nuclear fusion reactors by approximating the nuclear transmutation process.

Singular Value Decomposition (SVD) is a matrix factorization method that has applications in data compression and categorization problems. For an $m \times n$ matrix $\mathbf{A}$ the SVD is $$\mathbf{A}_{m \times n} = \mathbf{U}_{m \times m} \mathbf{\Sigma}_{m \times n} \mathbf{V}_{n \times n}^T,$$ where $\Sigma$ is a diagonal matrix. In this diagonal matrix, the entries decrease monotonically. As a result, an estimate of $A$ can be formed by truncating the $\mathbf{U}$, $\mathbf{\Sigma}$, and $\mathbf{V}$ matrices. Specifically, an estimate of the $a_{ij}$ entry in the $\mathbf{A}$ matrix is $$a_{ij} \approx \sum_{k=1}^K u_{ik} \sigma_{kk} v_{jk}.$$ In other words, an estimate of the $\mathbf{A}$ matrix — referred to as a "reconstruction" — can be obtained by storing only the first $K$ rows in $\mathbf{U}$, the first $K$ entries along the diagonal of $\mathbf{\Sigma}$, and the first $K$ columns in $\mathbf{V}$, which may result in significant memory savings. For example, consider the following $211 \times 340$ pixel image of an eye. This image can be represented by a $211 \times 340$ matrix with entries representing the color of each pixel (gray-scale values between 0 and 1). Three reconstructions of this image are shown below. The $K=20$ reconstruction provides a fairly accurate representation of the image and requires an order of magnitude less storage.

Monte Carlo radiation transport calculations are used to assess the safety and
performance of nuclear reactors and other nuclear systems. These calculations
generally employ a class of techniques known as Monte Carlo variance reduction
in order to obtain results in reasonable CPU times. Variance reduction
techniques require "importance maps" to be stored in memory. These maps
estimate how important different regions of space (i.e. elements of a 3D mesh)
and energy are to the final solution. This requires a significant amount of
RAM, which can limit the fidelity of the simulation that can be achieved.

SVD can be used to compress the importance maps for reduced memory
requirements. In the example above, the $\mathbf{A}$ matrix represents the two
spatial dimensions of the eye image. Importance maps generally have 4
dimensions: $x$-, $y$-, and $z$-position, and energy. In this work, the 3
spatial dimensions are collapsed such that the the 2 dimensions of the
$\mathbf{A}$ matrix represent space and energy. A Python program was created in
order to decomposed these matrices using the SciPy
implementation of SVD.
While this work is still in progress, some preliminary analysis was done using a
Holtec HI-STORM MPC spent nuclear fuel storage cask with a simple photon source and photon detector, shown below. Note that this is a 2-D slice through a cylindrical fuel cask.
This problem used 1,922,000 spatial divisions (124$\times$124$\times$125) and
19 energy divisions. With this problem, an importance map was generated,
decomposed, and then reconstructed with all values of $K$ between 1 and 19 (the
full matrix). These reconstructed importance maps are shown below. Note these
are 2-D slices through the cylindrical fuel cask through the $z$-axis. Two
different energies are shown: $10-20$ Mega-electronvolts (MeV) and $100-200$
kilo-electronvolts (keV).
These plots qualitatively show that even $K = 1$ reconstructions match the
original importance map fairly well. The match is better at higher energies,
because at low energies the spatial shape of the importance map has a steeper
curvature. In order to quantify how compressing these importance maps affects
performance, the standard Monte Carlo figure of merit was used, as given by
$$ {\text{figure of merit}} = \frac{1}{R^2t_{\text{proc}}}. $$
Here, $t_{\text{proc}}$ is the CPU time and $R$ is the relative error of the
final answer (i.e. the photon flux in the detector). For each
set of reconstructed importance maps, variance reduction parameters were
generated and photon transport simulations were run. The figure of merit plotted
as a function of $K$ is shown below.
The line of best fit of this plot has a slope of 0.071, which indicates that
the performance of the Monte Carlo simulation is not a strong function of the
$K$ value used to reconstruct the importance map. The line of best fit
indicates that by using a $K=1$ reconstruction, the performance is only reduced
by a factor of 1.6, while reducing the memory requirements by a factor of 19.
This is a promising preliminary result and I will be publishing on more
detailed results shortly.
**This work was cleared for public release as part of the 2017 Oak Ridge Postdoctoral Association Symposium*

Dissertation

2017 journal article

One of the challenges of designing and operating nuclear fusion reactors (such
as ITER) is the creation of radioactive
materials. For most viable nuclear fusion fuels, the process of fusing atomic
nuclei releases high-energy neutrons. These neutrons can penetrate through the
structural materials of the reactor and convert normally harmless materials
such as the iron in stainless steel or the hydrogen in water into radioactive
isotopes. Even after the reactor is switched off, these isotopes will persist
and emit high-energy photons (gamma rays) as they decay.
Some of these photons travel to areas where people may be taking measurements
or doing maintenance, so it is necessary to estimate the potential biological
dose rate. This is generally done using Monte Carlo radiation transport, which
involves simulating the paths that particles take as they move through the
reactor and interact with the materials.

From the standpoint of computer processing power, these calculations are not
possible without employing a class of techniques known as Monte Carlo variance
reduction. Variance reduction techniques are used to optimize Monte Carlo
transport by first finding an approximate solution to an adjoint formulation of
the problem (though not all techniques are explicitly formulated this way). This is done with respect to a detector response function. In
terms of a standard optimization problem, the variance of the detector response
function can be thought of as an objective function that should be minimized. For decay photons,
defining this detector response function is straightforward, as I demonstrated
in earlier work [2016
journal article]. For the neutrons that produce the decay photons, this
process is much more complicated.

As a component of my dissertation work I developed and implemented a technique
for automatically generating the neutron detector response function in order to
optimize Monte Carlo radiation transport for the aforementioned neutron problem.
The most difficult part of forming this function was approximating the process of "neutron activation"
in which material exposed to neutrons creates a vast network of radioactive
products. For example when $^{62}\mathrm{Ni}$ is exposed to neutrons, the following reactions, and many more, occur:
$$
{^{62}\mathrm{Ni}}
\xrightarrow{\textrm{(n, $\alpha$)}}
{^{59}\mathrm{Fe}}
\xrightarrow{\beta^-}
{^{59}\mathrm{Co}}
\begin{array}{ll}
\xrightarrow{\textrm{(n, $\gamma$)}}
{^{60}\mathrm{Co}} \\
\xrightarrow{\textrm{(n, $\gamma$)}}
{^{60\mathrm{m}}\mathrm{Co}}
\xrightarrow{\mathrm{IT}}
{^{60}\mathrm{Co}}
\end{array}
$$
In general, composition of materials before and after nuclear irradiation can be
related via the matrix exponential equation:
$$ \vec{N}(t) = e^{\mathbf{A}t} \vec{N}_0,$$
where $\vec{N}_0$ and $\vec{N}(t)$ are vectors describing the composition of a
material before and after neutron irradiation for time $t$. The $\mathbf{A}$ is
a matrix of reaction rates that depend on the rate in which neutrons pass
through a given area (the neutron flux). I developed a set of approximations
that can be used to simplify this matrix exponential such that it becomes a
linear operator with respect to neutron flux. The details of this process can
be found in my dissertation/journal article linked at the top of this page, but
just for fun I will show the final answer for the optimal neutron detector
response function ($q^+_n$) that results from the approximation of
$\mathbf{A}$:
$$
\begin{align}
q^+_n(E_n, E_p) = \int_{E_p} \text{d}{E_p} \phi^+_p(E_p)
\sum\limits_{c} \lambda_{c, i_c} b_{c, i_c}(E_p) N_{c,1}(0) \Bigg\lbrack \Big\lbrack &\sigma_{c, 1 \rightarrow 2}(E_n) \frac{t_{\text{irr}}^{i_c-1}}{(i_c-1)!} \prod\limits_{j=3}^{i_c} P_{c, j} \Big\rbrack e^{-d_{c, i_c} t_{\text{decay}}}\\
& + \sum_{j=2}^{i_c-1} \Big\lbrack \sigma_{c, 1 \rightarrow 2}(E_n) \frac{t_{\text{irr}}^{j-1}}{(j-1)!} \prod\limits_{k=3}^{j} P_{c, k} \Big\rbrack B_{c,i_c,j}(t_{\text{decay}}) \Bigg\rbrack.
\end{align}
$$

Here I'll show that my method, which is called Groupwise Transmutation Consistent Adjoint Driven Importance Sampling (GT-CADIS) improves the convergence rate of Monte Carlo neutron transport by a factor of $O(10^2)$ relative to the industry-standard method known as Forward-Weighted (FW)-CADIS. Consider a simple problem shown below consisting of a block of stainless steel (gray) with a U-shaped air-duct, a neutron source (blue) and a region (orange) where we'd like to estimate the biological dose rate from photons after neutron irradiation. Using FW-CADIS, neutron detector response function is shown below. Recall that the variance of this function can be thought of as the guess for the objective function. With GT-CADIS, the neutron detector response function has a much different shape: the intensity is much higher in the regions of stainless steel that border the air duct. Note that only the shape is important here, not the magnitude (or even units!). I generated variance reduction parameters using both detector response functions and tracked the rate of convergence of Monte Carlo simulations using each set. The image below shows the relative error in the neutron flux as a function of neutron transport processor time using GT-CADIS and FW-CADIS variance reduction parameters, as well as no variance reduction. Without variance reduction, the neutron flux converges fastest near the source. With FW-CADIS the neutron flux converges at a roughly uniform rate throughout the problem. With GT-CADIS, the neutron flux converges fastest in the area immediately surrounding the area of interest, as expected. In order to quantify how much faster the simulation converges with GT-CADIS, the Monte Carlo figure of merit was used, as defined by $$ {\text{figure of merit}} = \frac{1}{R^2t_{\text{proc}}}, $$ where R is the relative error of the photon flux in the region of interest (e.g. the orange sphere in the geometry image) and $t_{\text{proc}}$ is the neutron transport processor time. The figure of merit converges to a constant value during a simulation, as seen in the plot below. The ratio of figure of merits is the speedup. GT-CADIS was found to provide speedups of $200 \pm 100$ relative to FW-CADIS and $9 \pm 5 \times 10^5$ relative to simulations without variance reduction. In other words, GT-CADIS drastically reduces the required computational resources necessary for calculating these photon dose rates.

C++ source

Unit tests (via Python wrapper)

Theory, sample calculations to match unit tests

User's guide

Nuclear systems analysis often involves using the Monte Carlo approach to
simulate the paths that particles (i.e. neutrons and high-energy photons) take
after they are emitted from a source. This source may be a reactor core,
particle accelerator, or a radioactive seed used in radiation oncology. When
using the Monte Carlo approach, the source must be represented as a probability
density function (PDF) that typically varies as a function of position and
energy (i.e. energy of the emitted particle). With-industry standard Monte
Carlo radiation transport codes such as MCNP5, users must specify these
PDFs through the combination of rudimentary 1-dimensional PDFs such has
histogram, linear, and point-wise distributions. There are many cases when the
PDF is too complex for this method to be practical, such as cases where
particles are emitted from nuclear fusion plasmas or irradiated structural
materials.

I have added code to the Python for
Nuclear Engineering toolkit for the creation of energy-dependent Cartesian-
or tetrahedral mesh-based PDFs. These PDF can be sampling using unbiased and
biased sampling techniques that allow for the use of Monte Carlo variance
reduction. Sampling is done using the Alias
method, which allows for O(1) scaling with respect to the number of
discrete bins in the PDF. My code is written in C++ and also has Python and
Fortran interfaces and can be compiled directly into existing Monte Carlo codes.
More information can be found in the links at the top of this page.

Here's a quick demonstration of how this code can be used to create extremely
complex particle sources. Consider this CAD model of a turbine rotor I found on
GrabCAD.
I created a conformal tetrahedral mesh of this model using CUBIT. I then iterated through tetrahedral
elements of the mesh and assigned a source strength to each element based on the
$x$, $y$, $z$ coordinates of the center of the element using this equation:
$$q(x, y, z) = sin\left(\frac{x^2 + y^2 + (z + 10)^2}{1200}\right) + 1.$$
This equation does not have any physical significance, but it serves as a good
example because it would not be possible to represent using industry-standard
Monte Carlo codes. The resulting particle source PDF is shown below.
I compiled my C++ routine directly into MCNP via the Fortran interface I
created. I was then able to use the PDF shown above as a particle source for a
photon transport simulation. Below is an image showing the photon flux
distribution.
Although this example was just for fun, I have used this code as a component of
a software system I created for estimating biological dose rates for nuclear
fusion reactors problems [2016
journal article].

C++ source

Report

In grad school, one of my favorite courses was ME759: High-Performance Computing
for Applications in Engineering. The primary focus of this course was GPU
computing with CUDA, though we spent some time with MPI and OpenMP as well.
For my final project, I wrote a program to solve a system of linear equations
on the GPU. The problem I solved takes the form
$$ A_{n \times n} x_{n \times m} = b_{n \times m} $$
where $A$ is a diagonally-dominant banded matrix with some bandwidth $k$.
Problems of this form may arise when using numerical methods to solve ordinary
and partial differential equations. This equation can be solved in 3 steps: LU
decomposition, forward substitution, and backward substitution. For brevity,
I'll only discuss the first step here. More details can be found in the links at
the top of this page. LU decomposition decomposes the $A$ matrix into lower-
and upper-triangular matrices $L$ and $U$:
$$ A = LU. $$
The algorithm I chose for performing this decomposition is shown below.

```
```1 *for* i = 0 to n−2
2
3 *for* j = i+1 to n−1
4 A[j, i] = A[j, i]/A[i,i]
5 *end for*
6
7 *for* j = i+1 to n−1
8 *for* k = i+1 to n−1
9 A[j, k] = A[j , k] − A[j, i]∗A[i, k]
10 *end for*
11 *end for*
12
13 *end for*

This algorithm operates on the matrix in-place, such that resultant $L$ and $U$
matrices are produced within the lower and upper triangular portion of the
original $A$ matrix, which helps to minimize memory usage.

I adapted this algorithm to run efficiently on the GPU using CUDA. This
algorithm requires synchronization after the for-loop on line 3 and the set of
for-loops beginning on line 7. Though CUDA can impose synchronization between
threads within a block, there is no notion of block synchronization. As a
result, the outermost for-loop must occur on the CPU, and this for-loop must
call 2 separate CUDA kernels: one for the line 3 for-loop and one for the line
7 for-loops. The line 7 for-loops offered the best potential for
parallelization. No synchronization is required over $j$ or $k$, so large
portions of the $A$ matrix can be operated on by different threads
simultaneously. This is illustrated in the following diagram. The rectangles
represent the active region of the matrix for a given kernel call where the
order of kernel calls goes from light to dark. Within each rectangle,
execution is divided into blocks, and threads within these blocks perform the
operation appearing on line 9 simultaneously.
Further memory usage reductions were achieved by using the band storage
technique. This technique only stores the values within the band of the matrix. As a result, the active region of the matrix in my implementation proceeds according to the following diagram.
The aforementioned forward and backward substitution steps were also
implemented. The performance of the program was compared to the Intel Math Kernel Library (MKL)
on the CPU. The following plot shows the execution time necessarily to solve a
linear system with an $A$ matrix of size $n$, with bandwidths $k$ that are 10%,
50% and 90% of $n$. The results of this plot are for the case where $m=1$, but
scaling analysis was done over $m$ as well.
This plot shows that my CUDA implementation is marginally faster than MKL over a range of matrix dimensions and bandwidths.

Python source

I have been researching my genealogy as a hobby for the past 5 years or so.
I've accrued information on hundreds of direct ancestors going back as far as
William the Conqueror (c. 1028 - 1087). There are tons of genealogy tools out
there, but none of them were entirely satisfactory to me. In the throes of
writing my dissertation I needed a fun coding project, so I threw this together
in a few Sunday mornings. This project, which I call Herald, is written in
Python. Herald reads text-based input files and creates
PDF output files via SVG. This is an ongoing project, so right now only direct
decent is supported (i.e. parent, grand-parent, ...,
[$N\times$great]-grand-parent).
I came up with a very simple recursive algorithm for generating the tree (which
is a balanced binary tree in the graph-theory sense).

```
```1 *function* `assign_index`(Person person, string index)
2 person.set_index(index)
3
4 *if* person.mother():
5 `assign_index`(person.mother(), person.index() +'1')
6 *end if*
7
8 *if* person.father():
9 `assign_index`(person.father, person.index() + '0')
10 *end if*
11
12 *end function*

This function is called from the root of the tree (e.g. the person whose
ancestors you'd like to plot). Each person in the tree is assigned an index
string. The length of the string is the generation they belongs to. For
example, `'010'`

denotes the father's mother's father
(great-grandfather) of the root person. The generation that this person belong
to is

. In typical family trees, the
left-most branch is the paternal line (i.e. '0000...0') and the right-most
branch is the maternal line (i.e. '1111...1'). When printing a family tree,
members of a given generation can be sorted from left to right by simply
converting index values to binary (e.g. *len*('101') = 3`'1101' = 13`

) and sorting
using the decimal representation. A portion of my family tree is shown below.

- E. Biondo, G. Davidson, T. Pandya, S. Hamilton, T. Evans,
“Deterministically Estimated Fission Source Distributions for Monte Carlo
k-Eigenvalue Problems,”
*Annals of Nuclear Energy*, Vol. 119, pp. 7-22, 2018. [link] - E. Biondo, P. Wilson, “Transmutation Approximations for the
Application of Hybrid Monte Carlo/Deterministic Neutron Transport to Shutdown
Dose Rate Analysis,”
*Nuclear Science and Engineering*, Vol. 187, Issue 1, pp. 27-48, 2017. [link] - E. Biondo, A. Davis, P. Wilson, “Shutdown Dose Rate Analysis with
CAD Geometry, Cartesian/Tetrahedral Mesh, and Advanced Variance Reduction,”
*Fusion Engineering and Design*, Vol. 106, pp. 77–84, 2016. [link] - E. Biondo, P. Wilson, “Application of the Multi-Step CADIS Method
to Fusion Energy Systems Analysis,”
*International Conference on Mathematics & Computational Methods Applied to Nuclear Science & Engineering*, Jeju, South Korea, 2017. - E. Biondo, A. Ibrahim, S. Mosher, R. Grove. “Accelerating Fusion
Reactor Neutronics Modeling by Automatic Coupling of Hybrid Monte
Carlo/Deterministic Transport on CAD Geometry.”
*Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method (ANS MC2015)*, Nashville, TN, 2015.