Ben Fulan worked on this project at Andrew Ferguson’s lab.

 Here’s  Ben’s final report.

This project arose out of the search for a vaccine to combat HIV and the obstacles faced by this search. While it is possible to create a vaccine that induces an immune response against the virus, the effectiveness of this approach is limited by the ability of the virus to mutate into strains which would no longer be targeted by the immune response. The goal of this project is to produce an immune targeting scheme that can do the most damage to the virus while taking into account its ability to mutate.

To this end, a measure of the fitness of the virus is needed. This takes the form of a fitness landscape, which gives the virus’s ability to replicate as a function of its amino acid sequence. (See figure below – the dashed line indicates a mutation path from a potentially immune-targeted global maximum to a local maximum.) In particular, this amino acid sequence can be described using a binary string, where 0 represents the natural (“wild type”) form of the amino acid and 1 represents a mutant. The fitness function can then be constructed from empirical data. This function assigns a fitness cost to each mutation and also contains terms to represent interactions between pairs of mutations, which can be either beneficial or harmful.

Once we have obtained the fitness function, we can add a term representing immune pressure that, for each amino acid, imposes a penalty for not mutating, effectively forcing the virus to mutate into a lower-fitness state. This modified function measures the viral fitness given a particular amino acid sequence; the next step is to produce a function that measures the overall fitness of the virus across all possible states. This function should give more weight to high-fitness states, since it is more likely that the virus will be found in such a state. One such function is
\sum_z \exp{−\beta E(z,b)},
where \(E\) is a version of the fitness function described above (in this case low energy corresponds to high fitness), \(z\) is the binary amino acid sequence, \(b\) is the sequence representing immune pressure, and the “inverse temperature” \(\beta\) represents the difficulty of mutation. Finally, we can cast the problem as an optimization problem as follows: minimize the overall fitness function subject to constraints on b (since only so much immune pressure can be applied). We then develop an algorithm to perform this optimization and obtain an optimum immune pressure sequence \(b\).


We can then check these results against empirical data; if the data match poorly, we can reconsider our choice of the overall fitness function.