I just won a teaching award!

I am pleased as can be this afternoon. I recieved an email that told me that I won an OxTALENT award http://blogs.it.ox.ac.uk/oxtalent/oxtalent-competition-2014/

I think this is for my Introduction to Python mini-MOOC. Last academic year I was approached by my teaching manager at the IT Learning Programme, http://www.oucs.ox.ac.uk/itlp/, who I was providing Perl teaching for, about starting a second programming course. We discussed which programming language to do and what format it would take and eventually we chose Python and an online, supported learning format using an online interactive course text.

I think my manager at ITLP put me forward for this award, but I have won it and I will be receiving it later this month.

I am so happy about it.


Do I think that there are Human races? No, and here is why:

In biology “race” has a very specific meaning, it is a synonym for “sub-species”. Like “theory” the scientific meaning and the common use meaning aren’t the same. So we have to define what definition we are using.

The definition I am using for race is two populations of the same species that can interbreed and produce fertile offspring but don’t due to isolation.

Dogs and Wolves are a good example of this idea. They do produce fertile offspring but don’t breed together in the wild.

So under that definition we can superficially think that humans have races, but looking at the admixture history of our species (Hallenthal, 2014, check outhttp://admixturemap.paintmychromosomes.com/ for a history of human population interbreeding, my PhD supervisor was the senior investigator on the project) shows that no population is historically pure and that genetic admixture on a large scale is a common event between populations. 

When we think about dividing populations genetically with humans we fall into another major problem. There are only a handful of common variant locations in the human genome that reside in only one population. One report that I saw said that there are only 4 mutations that can only be found in Europe or Ethiopia and that all other common mutations are found to some degree in both populations. So there is no meaningful genetic delimiter to divide the populations.

As I think was mentioned we also have the problem of genetic diversity. The out of Africa event only happened ~70kya that is a blip in evolutionary time and there are only a handful of recognised selected for traits that have arisen in the time since that split. So we are not a very genetically diverse species having had a major population bottleneck at just about the same time as the diaspora that reduced our species to approximately 10k people. The result of that is that according to one source I read any two Gorillas are more genetically diverse than any two humans. So if you take two Gorilla siblings they will have more genetic diversity than a Northern European and an African of the San people.

So speaking as a population geneticist I don’t accept the premise that there are human races. The last time there was a set of separate human races was when the Neanderthals still walked the Earth.

In regards to the social construct of race: I still don’t buy it. I completely accept the premises of Ethnicity and Culture but if you can convert to a new culture then it cannot by definition be a hard line between groups (in example converting to Judaism.)



Though I don’t believe in race, I do know that racism is real and is a huge problem around the world. Please don’t confuse my scientific stand with an idea that because there are no races that there is no hate in the world.

On the fear of rejection

About three years ago I was about to graduate from the University of Reading and I was looking to do a PhD because I love doing research. My institution at the time had accepted me as a PhD student but there was no money to go with it so I couldn’t afford to attend Reading University even though I had a great project that would have been for the social good. Instead I started to look at other Universities in the area figuring that at least one would be interested and maybe I could get funding for my degree. When I started looking about the University of Oxford was within the geographic area that I was able to travel to for my studies and at first I gave it a pass because why would Oxford want me? I looked at the programs and found them to be fascinating but very intimidating, so I looked at other universities around me until I figured “wait, why I am saying no for them?”

Sure it was going to cost me £50 to apply but I was at a position in my life that I was able to apply at one of the top universities in the world and if I didn’t at least try then I would regret it. I knew I wasn’t going to get in but I wanted to say I had at least tried. So I put together the application form, got my letters of reference and submitted my application and waited for my rejection letter.

Eventually a letter arrived from Oxford and I opened it expecting to read “Dear Mr Aid, We regret to inform you that we aren’t interested” but instead it said that they wanted to interview me. This floored me more than the rejection would have. So I sent back a letter saying that “yes I will come” and went to my interview. I sucked badly at the interview, I was nervous and I messed up on very simple questions because I couldn’t think straight. Not only that but I was being interviewed by a Professor of Biochemistry that was also a biophysicist and the head of one of the national science societies and I was intimidated.

So I went home knowing that I was waiting for the “thank you but no” letter that was sure to come. Instead my letter said “thank you for coming, we would like to invite you to a second interview”. I, of course, accepted the invitation and then I was interviewed by the head of the department and a subject specialist. This time I was more prepared and didn’t make so many simple mistakes but I left thinking that I was going to be rejected but I knew that this time I had given the interview my best and that couldn’t be taken from me.

Again I was waiting for my rejection letter, but it didn’t come. I was offered a place on the program and I was offered a student stipend. If I had let my fear stop me from applying in the first place I wouldn’t be doing my PhD now. So don’t let fear stop you from doing something, make them say no, because you never know when they will say “yes”.

“Yes” is a beautiful thing and you will never hear it unless you take a risk.

Need a better way to work

I’m working on annotating my results at the moment and that means that I have to take each region and input it into a website, search the website for the information that I want and then hand copy it down to my spreadsheet. This is unacceptable, I am a computer scientist. There has to be a better way.

What I have is:

1) a .csv file that contains the regions that I am interested in.

2) a file that contains data about these regions (but single mutations can have 4-6 entries on each variant)

3) a website that I can manually look up to find this data.

If only I had a wheelbarrow, that would be something.

What I need to do is get the data for the areas that I want and then write a new program to take that data, process it and spit out the tables that I want. But first I must ask myself what do I want from these tables?

What I want is:

1) Chromosome and population of sample

2) start and end of region

3) nearest gene and if there is no overlap I need to know the distance to the nearest gene

4) The number of non-synonymous mutations (those that change the expressed protein)

5) The number of non-coding functional mutations in the region

6) if the hit region overlaps a gene (or more)

7) gene function information

This seems like a bigger task than it is, I should be able to do this fairly easy but I am having a mental block on getting started. It is usually about this time that I start asking for advice on how you get past the wall of starting terror.

Getting ready for my Transfer of Status and the trouble with MPJ-Express and rJava

I am coming to the end of my time as a Probationer Research Student and I have about 2.5 months to put together a 15-30 page document in order to convince the Dept. that I should be allowed to become a D.Phil student properly and continue my research.  I am hopeful about the process but I am very stressed out about it. The basic idea is to see if I can produce original research and if I can communicate what I find.  I know I can do both. I started doing original research as part of my undergraduate and I don’t really see a problem with striking out on your own. I have written enough papers and essays that I am confident that after a number of drafts I will have a document that will pass muster.

But that doesn’t stop the nerves, I am going to be tested and I will have to answer questions about my research with a panel for an hour or two. I need to remember that I can do this and that I am a capable researcher.

It doesn’t help that today I have had a series of failures of code to deal with. I have moved to a new set of servers operated by the Oxford Supercomputing Centre. This means that I have access to up to 128 cores at a time but I have had to teach myself how to code for supercomputers and I am regretting not taking that class as part of my undergraduate. I have been teaching myself Message Passing Interface (MPI) in the form of MPJ-Express so that I can continue to code in Java and just pass my .jar files to the server for execution.  My main task with MPJ-Express is to parallelise Niall Cardin’s treesim. Treesim is the program that I use to create trees from HapMap and 100 Genomes data and then I analyse it to determine where positive selection is occurring. It has taken me about 2 weeks to get this up and running. The MPJ-Express itself wasn’t too hard, I managed to get the wrapper up and running fairly quickly but not perfectly. It was controlling treesim with Java that proved to be the big problem. I was trying to use Java’s Runtime.getExec() process or ProcessBuilder to launch and manage the single thread process of treesim, but I had to replace that with the Apache common exec library (btw THANK YOU ASF for being there!).

Currently my MPJ-Express code launches all the demons and then the head node send a series of non-blocked command to all the cores. Each core then takes the command and runs the instance of treesim, but I would like to change it so that when a core is free it polls the head node for the next available command. But I am just happy that it is working at this point in time.

I have also been learning rJava to deal with the out put of another Java application that I have written. rJava is a bit of a nightmare because I don’t think enough people blog about it so it took me a long time to find a simple command that fixed my problem. I was having problems translating a Java matrix into a R matrix but the following code sorted out the problem.

This is the signature of the Java method


public double[][] getData()


I tested that within a Java environment and it was working fine. But when I moved it to a R environment it wouldn’t translate. The following code is what was needed to fix the problem of moving a 2d array from Java to R in rJava. Your .class file needs to be in the R/library/rJava/java directory of your install or you need to .addclassPath in R.


library(“rJava”) // loads the rJava library

.jinit(parameters=”-Xmx10240m”) // starts the JVM with the parameter -Xmx10240m since I needed 10GB of memory for my process

s <- .jarray(“string”, “args”) // creates a String[]

javaobj <- .jnew(“NameOfClass”, s) // executes public static void main

array <- .jcall(javaobj, “[[D]”, “getData”) // executes the method of your choice

array <- sapply(array, .jevalArray) // this is what I was missing


After sapply I could use the matrix normally.

In other news, I ran across one of my students and he told me how he used my class and some of my advice to replace a program he was using with a better one he wrote himself. It felt very good. I have been in touch with ITLP at Oxford and they want me to run the Perl class again next term but they also want me to put together a new distance learning Python class. I am very excited about this.

Positive Selection in Homo sapiens

Positive Selection in Homo sapiens

A look at my acadmic poster from last week showing some of the selective pressure on humans.

My first academic poster

I just wanted to share my first poster that I produced last week. If you have any questions please leave them in comments and I will do my best to answer them promptly. Just click on the thumbnail to see it at full size.

Stats Dept Presentation - Trinity 2013


I had a chance to explain this to a high schooler so I thought I would share it here.

Hi ,

I would be happy to help you understand what I do.

My research project is all about finding which mutations are beneficial to humans. The method that I am using is to look at the response to natural selection in human populations.

So what that means is that someone with a beneficial mutation should be in a better position to have children than someone that doesn’t have the mutations. One mutation that I mention in this poster is lactose tolerance in adults. In ages past the ability to digest milk as an adult gave some adults an advantage over other adults in that they had an additional food source. So because they had more sources of food they could provide for more children, therefore lactose tolerance spread through the populations of humans that raised cattle, sheep, and goats, if it appeared there. Europeans are by far the most lactose tolerant population in the world but there are some other populations that have evolved adult lactose tolerance. However in hunter gatherer populations the mutation will not provide a benefit and thus will not spread through the population.

So to look at this poster I want to start by looking at he ancestral graph under the methods. This is a look at one possible history of 120 people. On the left we have the modern people and as we move right we are going back in time watching as we find common ancestors to them all until at the very right we find the Most Recent Common Ancestor(MRCA). In an area of the genome that there is little selection we would expect each lineage to combine at about the same rate and have about the same number of descendants, however on this image we can see that the lines coloured red have a great number of descendants and we interpret that as evidence of natural selection. (Remember part of the definition of Natural Selection is that you have more children than other people.)

So on the left under results the first two images are showing that my method works. The Negative Control is showing that my method doesn’t show us false positives (where we think there is selection but there really isn’t) and the positive control is showing that I can detect the mutations for lactose tolerance in the population that I am studying. The second graph is showing a much higher resolution scan of the LCT (lactose tolerance) gene area to show that I can use a different data set with more information and the last image shows an area of selection that (I believe) no one else has discovered yet, that of MYT1L which is associated with the development of the central nervous system.

So the important points for my method are that it is low in false positives (though it is prone to false negatives where I say there isn’t enough evidence I could be wrong), it reproduces previously known information, and it discovers new areas that are under selective pressure.

An Update

I have been chugging along since the new year, my switch to veganism is going fairly well, though I find a good number of online vegans to be over the top and annoying. However, I sometimes wonder if that is how people feel about me and my passionate subjects, though I do hope that I am closer to the scientific consensus than the vegans I am talking about.

The dog we adopted just before Christmas has settled in very quickly and has become an integral part of the family. We have since adopted two more dogs, bring the total number of animals in the house to “too many”.

On the work front I am now starting to work with the latest release of the 1000 Genomes project meaning that I am now working with a more complete dataset than I was before. I have received a program to build new trees from Niall Cardin and I am in the process of building new trees for the CEU (People in Utah of Central/Northern European Ancestry) population and I will then see how the results compare to the HapMap results I already have.

I have also started to teach a class on the basics of Perl. You can check out the videos on YouTube at https://www.youtube.com/playlist?list=PLvAKnI6MaY1Yl0mH6iO13MvdAsgew6mB2 You can get the course materials for free at http://www.stats.ox.ac.uk/~aid/perl or at http://portfolio.it.ox.ac.uk/resource/course-pack/programming-perl-introduction

Fun with computers

I am proud of my geekiness.  I would rather spend a day building a computer or playing a game than go watch the game at the pub. So it always pleases me when I make my computer do something new, it was kind of the point of going back to school to study computer science.  At the moment I am working on a program that takes an ancestral genetic tree and the mutations in the area of the chromosome and maps the mutations to the different branches of the tree.  I got my first set of results today:

This is selection from the tree built at chromosome 2:13632000 in the HapMap phase II CEU population of 120 phased chromosomes. The first number is the branch number and then the series of SNP mutations that enter the population at that branch:

98, rs16832011
99, rs3769013, rs3769012, rs730005, rs2322813, rs3769008, rs12373779, rs9636213, rs3754689, rs4988201, rs3087343, rs1435577, rs3769001, rs4988163, rs7561565, rs7581814, rs12472293, rs2839740
101, rs11884924, rs16832067, rs3816088, rs2304369, rs4988232, rs4988191, rs4988189, rs4988186, rs4988185, rs4988173, rs4988172

The idea is I can take this information and correlate it to my other program that generates the probability that each branch is a locus of selective pressure and then show which SNPs are probable candidates for causing selection in our species.

Teaching and talking about my research

I find it odd that I am writing about how much I am enjoying teaching, given how much I hated school when I was a student. I didn’t enjoy  school until college/university.

Yesterday, I was demonstrating for the evolution day part of the Molecular Genetics and Cell Biology module for the new year of PhD students. It was really nice to feel like I knew what I was talking about given that I took this course last year. It is incredible what a difference a year makes in the DTC programme.  I was able to explain about the different ideas in evolutionary science and offer suggestions of papers and ideas to help the new students understand the topics and it was an incredible feeling knowing that I was able to help people understand.

At the moment I am working on three tasks of my list of TODO items. First I am preparing a talk on my summer projects for Monday’s inter-DTC seminar series. I need to entertain other PhD students for about 20 minutes before I get shot down with intelligent questions.

Second, I am preparing for my first viva on my research proposal.  I need to give a 10 minute talk on what I plan to do for the next 3 years and then have two academics grill me on the proposal. I’m nervous about this but I need to put that to the side and get it done.

Finally, I am working on changing some of the material for next week’s statistics work for the new PhD students. It is nice to be able to have a hand in setting up the practical and assessment work for the module.

After next week, I don’t think I am doing any teaching for a while, so I will be able to focus on my research for a bit.

%d bloggers like this: