Research Projects
Introduction
Population dynamics is a branch of ecology which helps us to understand the causes and effects in the variations of the size of a population over time. It helps biologists and stakeholders to manage renewable resources and also protect biodiversity. Living organisms in an ecosystem interact with each other in a variety of ways. These include depredation, cooperation and competition. A food chain is formed when these living organisms feed on each other in a sequence. Food chain population dynamics have interesting applications to invasive species control, endangered species protection and the maintenance of diversity and a stable ecological community. We will use mathematical models to understand and explore questions relating to some of the complex interactions between these living organisms in multilevel food chains.
The Project
This project will mainly focus on developing mathematical models to understand the dynamic relationships between predators and prey. A predator can be described as an organism which feeds on another organism called the prey. We will investigate the impacts of both cannibalism and refuges or protection areas on the general population dynamics. Cannibalism is the act of killing and partial consumption of individuals of the same species. Cannibalism is a common ecological interaction among species in a multilevel food chain. This project will begin with students introduced to the basics of population modeling, analysis of a single specie model and writing simple algorithms for numerical simulations. Then the goal is to extend to a two and three species model. This project will also explore other different ecological interactions and study their overall effects on the stability of the ecological community. We shall carry out a dynamical system analysis of our developed models and support our theoretical results with numerical experiments.
Background
Interested students will be introduced to the relevant concepts in differential equations. Student should at least have taken Calculus I. Programming in Matlab/Python/Mathematica is desirable but not required. Students will learn basics on the job.
How to Apply
Introduction
The ongoing COVID19 pandemic continues to present important challenges to the medical community, the academic community, and the public at large. Successfully curbing the spread of the SARSCoV2 virus involves complex and multidimensional issues including an understanding of its reproduction rate R0, the herd immunity threshold, vaccine effectiveness, vaccine hesitancy, mask mandates, demographics, data availability, and the accessibility of the right information at the right time, among others. Moreover, as public health officials target the virus, their policies would not be fully successful if they do not take into account all aspects of the health of the population, including mental health. Understanding these issues and investigating their relative merits and how they interact with each other in our society is subject to a growing debate in the academic profession across disciplines and it will be the subject of our summer research.
The Project
In this project, we will use statistical modeling to analyze the spread of the virus. More specifically, we will develop mathematical models estimable using regression analysis to understand the factors that drive the transmission of the virus and their possible effect on overall health and mental health. The explanatory variables will include social, economic, cultural, and historical factors, as well as the different strategies associated with the pandemic response. We will focus mainly on crosscountry observational data, but we will be open to analyzing alternative data if available. We will then estimate causal relationships using twostage least squares instrumental variable regression models and differenceindifference methodologies when clear natural experiments are identifiable and available.
Prior to the statistical modeling, at least three research steps are crucial and often overlooked: 1) A careful review of the literature available up to June of 2022. The students will need to critically read and summarize the relevant papers in this rapidly growing area and produce an annotated bibliography that we will use in our final paper. 2) A careful collection and interpretation of the data available, including sources, meaning, usefulness, strength, and weaknesses. The students will need to create a replicable dataset according to the rules of Data in Brief. This will serve as a companion data article for our main research paper. 3) A beautiful, informative, and interactive visualization of the data that can be publishable in a journal or in a Website. For this purpose, I expect to use R libraries such as ggplot2 and plotly, but other options are on the table. In general, the specific directions of the project may change in flexible ways as we go. After all, if we knew what we are doing, it would not be called research.
Background
Most importantly, students will need a passion for research, for the topic, for seeking knowledge, and a positive yet humble and respectful attitude towards pursuing the truth. In addition, the ideal candidate will also have: 1) A solid understanding of statistics (an A in their basic stats courses), 2) A good understanding of multivariate regression analysis, and 3) Knowledge or experience of R programming. If a student does not meet these requirements, a strong commitment to learning them as well as a passion for research and the topic could be enough for this project. We will provide guidance when needed and foster teamwork, inclusivity, and complementarities.
How to Apply
Introduction
We start with a fairly innocuous question: How do we train a computer to recognize that a group of realvalued, sampled functions is the same, up to a parameterization? By sampled, we mean that we’ve evaluated the function at a sequence of numbers in its domain. This is sometimes referred to as a one dimensional time series. For instance, below are 100 samples of the function cos(2 π t) over the interval [0, 4]
By parameterization, we mean that the function can be precomposed with a monotonically increasing function h(x) as f(h(x)). For instance, the animation below shows 400 samples of different parameterizations of the function
on the interval [0, 1], composed with different monotonic functions h(x)
Now stare at this picture of a bunch of time series for a moment, and see if you can sort them into different groups, or “clusters,” of equivalence classes of time series with a reparameterization as the equivalence relation
There are actually 4 distinct clusters, as shown below
But how can we train a computer to discover that when all we’re given is groups of 400 numbers?
Approach #1: Dynamic Time Warping
One popular numerical approach to compare functions is known as dynamic time warping^{[1]}. It explicitly solves for a discrete version of the optimal parameterization to best align two functions known as a “warping path” or “timeordered correspondence.” One can associate a total cost to this parameterization to measure dissimilarity.
The animation below shows an example of applying dynamic time warping to two time series which are the same up to a parameterization. As you can see, the parameterization matches parts of the functions that are doing the same thing, even when they occur at different times.
The problem with this approach, however, is that the cost associated with an optimal parameterization fails to be a metric, as it can violate the triangle inequality. In this project, we want to devise something that is a metric.
[1] Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, 26(1):43–49, 1978
Approach #2: Persistence Diagrams + Bottleneck Distance
Another tool we can use which is blind to parameterization comes from the field of topological data analysis and is known as a lower star filtration, which is an instance of what’s known of as “watershed methods.” We flow water from the bottom of the graph to the top and keep track of the pools that are created. The moment a pool is created is referred to as a birth event, and the moment it merges with another pool is known as a death event. Birth events happen when the water reaches a local min, and death events happen when water reaches a local max. We can record these events in what’s known as a persistence diagram, where each point in this diagram corresponds to a pool of water that was on its own for some amount of time, and its deathbirth is referred to as its persistence. Below is an example animation of the watershed process used to build a persistence diagram
The animation below shows the persistence diagram for reparameterized time series.
The diagram stays numerically constant over all of the different warps, so it is automatically blind to different parameterizations. Adding noise does not significantly change the points of high persistence, though it does add many points with low persistence towards the “diagonal” where birth = death.
However, we can devise a metric between persistence diagrams which is stable, in the sense that slight noise does not cause the metric to blow up. This metric is known as the bottleneck distance, and it is the result of constructing a perfect matching between a pair of persistence diagrams and reporting the maximum length edge in this matching. For instance, here is a perfect matching between the diagrams of two of the time series in the above animation
Notice how the maximum length edge is quite small, reflecting that these time series are close to each other even with noise and different parameterizations.
It seems like this is a great approach, so what’s the downside? Unfortunately, it’s blind to a class of transformations much larger than just reparameterizations. It also can’t tell a time series from its reflection. It also gets mixed up with less obvious examples, such as the two below, which have identical persistence diagrams even though they’re not simple reflections or reparameterizations of each other
Approach #3: (Chiral) Merge Trees
We can devise a structure over our lower star filtrations that’s stronger than a persistence diagram and which keeps track of a hierarchy of pairings that happens as connected components merge together. This structure is known as a merge tree, and it can be defined over topological spaces more general than 1D time series. The animation below shows a merge tree construction for the union of balls of increasing radius around a collection of points in 2D
In the case of matching two time series up to a parameterization, though, the merge tree is defined over an interval with a leftright ordering, which induces a leftright ordering on the branches. This is referred to as a chiral merge tree^{[2]}, and it’s the object we will study in this REU. In these trees, all leaf nodes correspond to local mins, and all internal nodes correspond to local maxes. Below are what all of the chiral merge trees look like on the original 4cluster examples.
As you can see, the heights of the nodes and topology of the trees is equivalent within each class of time series.
These trees are more informative than persistence diagrams. Unlike persistence diagrams, they can tell apart time series which have been reflected. They can also tell apart more subtle distances that persistence diagrams are blind to. The example we showed before with identical persistence diagrams are now distinct in their merge tree representations:
The problem with this representation, however, is that it’s very difficult to compare two merge trees, as we will articulate in the next section.
[2] Curry, Justin. “The fiber of the persistence map for functions on the interval.” Journal of Applied and Computational Topology 2.3 (2018): 301321.
The Project: Stable, Efficiently Computable, And Informative Metric between Chiral Merge Trees
This REU will be focused on devising a new metric between chiral merge trees, which, as we’ve outline above, can apply to matching time series. We would like the metric to satisfy the following 4 properties
 It is actually a metric; i.e. it satisfies all metric properties: symmetry, reflexivity, and the triangle inequality
 It is stable; small perturbations in the time series should lead to small perturbations in the metric
 It is informative; that is, it is lower bounded by the bottleneck between persistence diagrams, but it is also able to tell apart changes in time series that persistence diagrams are blind to.
 It is efficiently computable; that is, there exists a polynomial time algorithm to compute it
Surprisingly, a metric and an algorithm to compute it that satisfy all four of the above properties has eluded researchers so far. The table below shows a few examples of approaches and properties they satisfy
Metric 
Stable 
Informative 
Efficiently Computable 

Dynamic Time Warping 
❌ 
✔️ 
✔️ 
✔️ 
Persistence Diagrmas + Bottleneck Distance 
✔️ 
✔️ 
❌ 
✔️ 
Interleaving Distance between (General) Merge Trees^{[3,4]} 
✔️ 
✔️ 
✔️ 
❌ 
Integer Linear Programming Metric between (General) Merge Trees^{[5]} 
✔️ 
✔️ 
✔️ 
❌ 
Merge Tree Edit Distance^{[6]} 
✔️ 
❌ 
✔️ 
✔️ 
??? (REU Goal for Chiral Merge Trees) 
✔️ 
✔️ 
✔️ 
✔️ 
In this REU, our goal will be to devise a metric on chiral merge trees that satisfies all properties, and then to implement this algorithm in Python and test it on real time series data.
[3] Morozov, Dmitriy, Kenes Beketayev, and Gunther Weber. “Interleaving distance between merge trees.” Discrete and Computational Geometry 49.2245 (2013): 52.
[4] Agarwal, Pankaj K., et al. “Computing the GromovHausdorff distance for metric trees.” ACM Transactions on Algorithms (TALG) 14.2 (2018): 120.
[5] Pegoraro, Matteo. “A Metric for TreeLike Topological Summaries.” arXiv preprint arXiv:2108.13108 (2021).
[6] Sridharamurthy, Raghavendra, et al. “Edit distance between merge trees.” IEEE transactions on visualization and computer graphics 26.3 (2018): 15181531.
Background of Student
Though this project is more on the math side, it will bring in a mix of math and CS skills, particularly via the focus on computability and implementations. An ideal student for this project would have the following preparation:
 Have taken at least one CS course, with preferred experience in data structures and algorithm analysis
 Python/numpy knowledge is a plus, but we will go over what is needed at the beginning of the project
 Having taken at least one of modern geometry, topology, or abstract algebra is a plus. Alternatively, or in addition, having taken theory of computation or theory of algorithms is a plus.
 Have a willingness to work at the interface of CS and math and to explore mathematical ideas via coding examples.
How to Apply
This REU program is supported by the National Science Foundation (NSF), grant number 1851948.