Getting ready for my Transfer of Status and the trouble with MPJ-Express and rJava

I am coming to the end of my time as a Probationer Research Student and I have about 2.5 months to put together a 15-30 page document in order to convince the Dept. that I should be allowed to become a D.Phil student properly and continue my research.  I am hopeful about the process but I am very stressed out about it. The basic idea is to see if I can produce original research and if I can communicate what I find.  I know I can do both. I started doing original research as part of my undergraduate and I don’t really see a problem with striking out on your own. I have written enough papers and essays that I am confident that after a number of drafts I will have a document that will pass muster.

But that doesn’t stop the nerves, I am going to be tested and I will have to answer questions about my research with a panel for an hour or two. I need to remember that I can do this and that I am a capable researcher.

It doesn’t help that today I have had a series of failures of code to deal with. I have moved to a new set of servers operated by the Oxford Supercomputing Centre. This means that I have access to up to 128 cores at a time but I have had to teach myself how to code for supercomputers and I am regretting not taking that class as part of my undergraduate. I have been teaching myself Message Passing Interface (MPI) in the form of MPJ-Express so that I can continue to code in Java and just pass my .jar files to the server for execution.  My main task with MPJ-Express is to parallelise Niall Cardin’s treesim. Treesim is the program that I use to create trees from HapMap and 100 Genomes data and then I analyse it to determine where positive selection is occurring. It has taken me about 2 weeks to get this up and running. The MPJ-Express itself wasn’t too hard, I managed to get the wrapper up and running fairly quickly but not perfectly. It was controlling treesim with Java that proved to be the big problem. I was trying to use Java’s Runtime.getExec() process or ProcessBuilder to launch and manage the single thread process of treesim, but I had to replace that with the Apache common exec library (btw THANK YOU ASF for being there!).

Currently my MPJ-Express code launches all the demons and then the head node send a series of non-blocked command to all the cores. Each core then takes the command and runs the instance of treesim, but I would like to change it so that when a core is free it polls the head node for the next available command. But I am just happy that it is working at this point in time.

I have also been learning rJava to deal with the out put of another Java application that I have written. rJava is a bit of a nightmare because I don’t think enough people blog about it so it took me a long time to find a simple command that fixed my problem. I was having problems translating a Java matrix into a R matrix but the following code sorted out the problem.

This is the signature of the Java method

 

public double[][] getData()

 

I tested that within a Java environment and it was working fine. But when I moved it to a R environment it wouldn’t translate. The following code is what was needed to fix the problem of moving a 2d array from Java to R in rJava. Your .class file needs to be in the R/library/rJava/java directory of your install or you need to .addclassPath in R.

 

library(“rJava”) // loads the rJava library

.jinit(parameters=”-Xmx10240m”) // starts the JVM with the parameter -Xmx10240m since I needed 10GB of memory for my process

s <- .jarray(“string”, “args”) // creates a String[]

javaobj <- .jnew(“NameOfClass”, s) // executes public static void main

array <- .jcall(javaobj, “[[D]”, “getData”) // executes the method of your choice

array <- sapply(array, .jevalArray) // this is what I was missing

 

After sapply I could use the matrix normally.

In other news, I ran across one of my students and he told me how he used my class and some of my advice to replace a program he was using with a better one he wrote himself. It felt very good. I have been in touch with ITLP at Oxford and they want me to run the Perl class again next term but they also want me to put together a new distance learning Python class. I am very excited about this.

Calling Java code from R

This blog helped me solve an issue that I was having with rJava.

Darren Wilkinson's research blog

Introduction

In the previous post I looked at some simple methods for calling C code from R using a simple Gibbs sampler as the motivating example. In this post we will look again at the same Gibbs sampler, but now implemented in Java, and look at a couple of options for calling that code from an R session.

Stand-alone Java code

Below is some Java code for implementing the bivariate Gibbs sampler discussed previously. It relies on Parallel COLT, which must be installed and in the Java CLASSPATH in order to follow the examples.

It can be compiled and run stand-alone from an OS shell with the following commands:

As discussed in the previous post, it is possible to call any command-line program from inside an R session using the system() command. A small wrapper function for conveniently running this code from within R can be written as follows.

View original post 733 more words

%d bloggers like this: