Using JavaGD in Java applications

Goal: Create a Java application with a JFrame that contains a JGDPanel from JavaGD and plot something via JRI.

Solution: The following code is a derivate of JGR code snipplets under GPL-2:

import java.awt.Component;
import java.awt.event.WindowEvent;
import java.awt.event.WindowListener;

import javax.swing.JFrame;

import org.rosuda.JRI.Rengine;
import org.rosuda.javaGD.GDInterface;
import org.rosuda.javaGD.JGDBufferedPanel;

public class JavaGDExample extends GDInterface implements WindowListener {

    JFrame f;
   
    public void gdOpen(double w, double h) {
        if (f!=null) gdClose();
        f = new JFrame("JavaGD Example");
        f.addWindowListener(this);      
        c = new JGDBufferedPanel(w, h);
        f.getContentPane().add((Component) c);
        f.pack();
        f.setVisible(true);
    }
   
    public void gdClose() {
        super.gdClose();
        if (f!=null) {
            c=null;
            f.removeAll();
            f.dispose();
            f=null;
        }
    }

    public static void main(String[] args) {
        Rengine engine = new Rengine(new String[] {"--vanilla"}, true, null);      
        engine.eval(".setenv <- if (exists("Sys.setenv")) Sys.setenv else Sys.putenv");
        engine.eval(".setenv("JAVAGD_CLASS_NAME"="JavaGDExample")");
        engine.eval("library(JavaGD)");
        engine.eval("JavaGD()");
        engine.eval("plot(rnorm(100))");       
    }
   
    /** listener response to "Close" - effectively invokes <code>dev.off()</code> on the device */
    public void windowClosing(WindowEvent e) {
        if (c!=null) executeDevOff();
    }
    public void windowClosed(WindowEvent e) {}
    public void windowOpened(WindowEvent e) {}
    public void windowIconified(WindowEvent e) {}
    public void windowDeiconified(WindowEvent e) {}
    public void windowActivated(WindowEvent e) {}
    public void windowDeactivated(WindowEvent e) {}
}

Et voilà:

JavaGDExample

Of course you have to make sure that the JRI native library is in a directory listed in java.library.path, for example on my machine via:

java -Djava.library.path=/usr/local/lib/R/site-library/rJava/jri/ JavaGDExample

Also you have to provide the correct path for the environment variable R_HOME such as:

R_HOME=/usr/lib64/R/

And of course you need the two jars javaGD.jar and JRIEngine.jar in your CLASSPATH.

Linux USB driver for the Arexx TL-500 – Part II

Last time we started communicating with the TL-500. This time we want to find out the meaning of the data rows from the IN endpoint.

Therefore we start the USB Sniffer SnoopyPro again and observe the data flow under Windows while the logger is working. We then export the USB log as XML and grep for “000a”. Now we can compare the raw data from the USB device with the recorded measurements. The first surprising thing is that in some cases SnoopyPro seems to miss some incoming data events (ca. 5% of the total events). (The other possibility is that the Windows software is interpolating data points when you export them – but I disbelieve this, since this would be rather stupid, wouldn’t it?)

By only using one sensor at a time one is able to distinguish the data for each sensor. I have the following sensors and the respective data has the following values as hexadecimal digits in the 5th-8th position:

8818: 72 22
8908: cc 22
17882: da 45
17882RH: db 45
18438: 06 48
18438RH: 07 48

After changing the order of the two halfs, the temperature sensors translate directly: 0x2272=8818, 0x22CC=8908, 0x45DA=17882 and 0x4806=18438. Also if the last bit of the first half is 0, we have a RH sensor, otherwise a temperature sensor. This would imply that all sensors have even numbers. RH sensors would have the same id incremented by 1. Is this always the case?

Just by looking at the raw data rows and exposing the sensors to different conditions, I suspect the 9-14 hexadecimal digits to contain the temperature and RH data.

Let’s look at a sample data set: tl500.csv. We load it into R and start investigating:

data <- read.table("tl500.csv", header=TRUE)

hex2int <- Vectorize(function(x) {
  v <- 0
  for (i in 1:nchar(x)) {
    w <- ifelse(substr(x,i,i)==c(as.character(0:9), letters[1:6]), 0:15, NA)
    w <- ifelse(!all(is.na(w)), w <- w[!is.na(w)], NA)
    v <- v + w*16**(nchar(x)-i)
  }
  return(v)
})

data$Sensor <- substr(data$Rawdata,5,8)
data$RawValue <- hex2int(substr(data$Rawdata,9,12))
plot(data$RawValue, data$Value)

There are obviously subgroups and in these groups a linear relationship exists between the 5th and 6th byte (9-12 hexadecimal digits) of the raw data and the measurements. Adding the 7th byte does not improve the linear correlation. (Weee! No floating point data!) Further investigation shows us that we can calculate the measurements from RH sensors by (0.6 + Rawdata * 0.03328) RH%, from temperature sensors TSN-TH70E by (-39.58 + Rawdata * 0.01) °C and from temperature sensors TL-3TSN by (Rawdata * 0.0078) °C. These coefficients are based on the the following linear regressions. With more data they will slightly change, but given the accuracy and that you need to calibrate your sensors either way, it’s a sufficient translation of uncalibrated sensor data.

rhdata <- data[data$Sensor %in% c("db45", "0748"),]
t1data <- data[data$Sensor %in% c("da45", "0648"),]
t2data <- data[data$Sensor %in% c("7222", "cc22"),]

lm(Value~RawValue,data=rhdata)
lm(Value~RawValue,data=t1data)
lm(Value~RawValue,data=t2data)

Let’s update the C program from the last time with the new results:

static int get_sensor(unsigned char data[64]) {
    return (data[3])*(256)+(data[2]);
}

static int get_value(unsigned char data[64]) {
    int value = 0;
    // We use a loop since it is not clear whether we want to use also data[6] someday.
    for (int i=4; i<=5; i++) {
        value += data[i]*pow(16,2*(5-i));
    }
    return value;
}

/**
 * We assume that all TSN-TH70E sensors have ids bigger than 10000.
 * If the id is then odd we have a humidity sensor.
 * Has anyone more information about this?
 */

static double get_measurement(unsigned char data[64]) {
    double value = get_value(data);
    int sensor = get_sensor(data);
    if (sensor < 10000) {
        return value * 0.0078;
    } else {
        if (sensor%2==0) {
           return -39.58 + value * 0.01;
        } else {
           return 0.6 + value * 0.03328;
        }
    }
}

/**
 * We assume that all TSN-TH70E sensors have ids bigger than 10000.
 * If the id is then odd we have a humidity sensor.
 * Has anyone more information about this?
 */

const char* get_unit(unsigned char data[64]) {
    int sensor = get_sensor(data);
    if (sensor > 10000 && sensor%2!=0) {
        return "%RH";
    } else {
        return "°C";
    }
}

If we now add the following line into the data reading loop we get a usefull result:

printf("From sensor %d we get a raw value %d. We guess this means %3.2f %s. n",
       get_sensor(dataUp), get_value(dataUp), get_measurement(dataUp), get_unit(dataUp));
Found Arexx TL-500.
Data: 00 0a 72 22 0c 17 05 00 00 00 15 00 ff 04 ed 1e 06 00 00 00 ff ff ff ff ff ff ff ff ff ff ff 0b ff af 05 00 00 09 07 48 04 ec c9 05 00 00 09 72 22 0b ff df 05 00 00 00 06 48 48 ed 31 2e 3f 09
From sensor 8818 we get a raw value 3095. We guess this means 24.14 °C.
Data: 00 0a 07 48 04 f5 17 00 00 00 06 00 ff 04 ed 1e 06 00 00 00 ff ff ff ff ff ff ff ff ff ff ff 0b ff af 05 00 00 09 07 48 04 ec c9 05 00 00 09 72 22 0b ff df 05 00 00 00 06 48 48 ed 31 2e 3f 09
From sensor 18439 we get a raw value 1269. We guess this means 42.83 %RH.
Data: 00 0a 72 22 0c 0f 39 00 00 00 0c 00 ff 04 ed 1e 06 00 00 00 ff ff ff ff ff ff ff ff ff ff ff 0b ff af 05 00 00 09 07 48 04 ec c9 05 00 00 09 72 22 0b ff df 05 00 00 00 06 48 48 ed 31 2e 3f 09
From sensor 8818 we get a raw value 3087. We guess this means 24.08 °C.
Data: 00 0a 06 48 18 dd 51 00 00 00 06 00 ff 04 ed 1e 06 00 00 00 ff ff ff ff ff ff ff ff ff ff ff 0b ff af 05 00 00 09 07 48 04 ec c9 05 00 00 09 72 22 0b ff df 05 00 00 00 06 48 48 ed 31 2e 3f 09
From sensor 18438 we get a raw value 6365. We guess this means 24.07 °C.

You can download the updated C file here. Now we can create our first applications for Linux.

P.S.: The connection quality is in the 21th and 22th hexadecimal digits and further blog posts will follow…

Using libusb to write a Linux USB driver for the Arexx TL-500 – Part I

The Arexx TL-500 is a well-priced temperature and humidity logging system.

TL-500

TL-500

But with only twelve direct requests for a Linux USB drivers, Arexx recently decided to abandon their plans creating a Linux driver for this device (I think there is a much higher demand, but only 12 people directly asked for Linux support). Since my TL-500 is connected to a Linux server and I don’t want to have Windows running in a virtual machine all the time, I decided to write a Linux driver for the TL-500 myself.

Let’s connect the TL-500 to a Linux box and take a look at the device and it’s endpoints:

kornel@kornel-desktop:~$ lsusb -v
[...]
Bus 003 Device 002: ID 0451:3211 Texas Instruments, Inc.
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass          255 Vendor Specific Class
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x0451 Texas Instruments, Inc.
  idProduct          0x3211
  bcdDevice            1.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           32
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0
      bInterfaceProtocol      0
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
Device Status:     0x0000
  (Bus Powered)

Aside from the control endpoint there are two bulk endpoints. (If you are not that familiar with USB, you perhaps want to read some pages from chapter 13 “USB Drivers” from the book Linux Device Drivers, which is available from O’Reilly and also freely available as pdf under creative commons license.) The endpoint with address 0x01 transfers data from the computer to the device (OUT endpoint – this direction is called “down”) and the endpoint with address 0x81 transfers data from the device to the computer (IN endpoint – this direction is called “up”).

But how do we get the logging data from the device? Since Arexx neither answered my emails nor responded to the forum posts, I used the Windows USB Sniffer SnoopyPro for unscrambling the communication.

If you have installed SnoopyPro and started, first open “View -> USB Devices“, then select “File -> Unpack drivers” and after that “File -> Install Service” from the menu of the USB Devices window. Right-click on the appropriate device (the prior lsusb told us the ID 0451:3211) and select “Install and Restart“.

Start the USB Sniffer for the TL-500

An internal window “USBLog1” should open, counting the recorded packets. If you have recorded enough, stop the recording and take a look at the output:

Watching the Windows driver communicating with the device

Watching the Windows driver communicating with the device

After a few times repeating this process the following pattern emerges:

  • The computer sends data starting with 0x04.
  • After that he sends data starting with 0x03. The TL-500 responds with data that starts with 0x0000 or with data that actually contains logging data. This repeats ad infinitum.
  • Rarely the loop hangs and the computer resends again data starting with 0x03.

Fortunately with the library libusb nowadays writing a USB device driver is no big problem. (If you don’t want to use libusb but instead get involved with the kernel programming, take a look at the tutorial Writing a Linux kernel driver for an unknown USB device from Matthias Vallentin.) But let’s start with libusb and their example file under LGPL and implement what we have discovered so far:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <libusb.h>

uint16_t VENDOR = 1105; /* =0x0451 */
uint16_t PRODUCT = 12817; /* =0x3211 */

libusb_device_handle** find_tl500(libusb_device **devs) {
    libusb_device *dev;
    libusb_device_handle **handle;
    int i = 0;

    printf("Trying to find Arexx logging system.n");
    while ((dev = devs[i++]) != NULL) {
        struct libusb_device_descriptor desc;
        int r = libusb_get_device_descriptor(dev, &desc);
        if (r < 0) {
            fprintf(stderr, "failed to get device descriptor");
            return NULL;
        }

        printf("%04x:%04x (bus %d, device %d)n",
            desc.idVendor, desc.idProduct,
            libusb_get_bus_number(dev), libusb_get_device_address(dev));

        if (desc.idVendor == VENDOR && desc.idProduct == PRODUCT) {
            printf("Found Arexx TL-500.n");
            int usb_open = libusb_open(dev, handle);
            if (usb_open==0) {
                printf("libusb_open successful.n");
                return handle;
            }
            fprintf(stderr, "libusb_open failed. Error code %d.n", usb_open);
            return handle;
        }
    }

    return NULL;
}

int main() {
    libusb_device **devs;
    int r;
    ssize_t cnt;

    r = libusb_init(NULL);
    if (r < 0)
        return r;

    cnt = libusb_get_device_list(NULL, &devs);
    if (cnt < 0)
        return (int) cnt;

    libusb_device_handle **handle = find_tl500(devs);

    if (handle==NULL) {
        fprintf(stderr, "No logging system found.n");
        libusb_free_device_list(devs, 1);
        libusb_exit(NULL);
        return -1;
    }

    unsigned char ENDPOINT_DOWN = 0x1;
    unsigned char ENDPOINT_UP = 0x81;
    int actual_length;
    unsigned char dataUp[64];
    unsigned char dataDown[64];
    for(int j=0;j<64;j++) { dataDown[j] = 0; }
    dataDown[0] = 4;
    libusb_bulk_transfer(handle[0], ENDPOINT_DOWN, dataDown, sizeof(dataDown), &actual_length, 0);
    dataDown[0] = 3;
    for(int i=0; i<10000; i++) {
        r = libusb_bulk_transfer(handle[0], ENDPOINT_DOWN, dataDown,
                                 sizeof(dataDown), &actual_length, 0);
        r = libusb_bulk_transfer(handle[0], ENDPOINT_UP, dataUp,
                                 sizeof(dataUp), &actual_length, 1000);
        if (r == 0 && actual_length == sizeof(dataUp)) {
            if (dataUp[0]!=0 || dataUp[1]!=0) {
                printf("Data: ");
                for(int j=0;j<64;j++) {
                    printf("%02x ",dataUp[j]);
                }
                printf("n");
            }
            sleep(1);
        } else {
            printf("Something went wrong (r == %i, actual_length == %i , sizeof(data) == %lu ).n",
                   r, actual_length, sizeof(dataUp));
        }
    }

    libusb_free_device_list(devs, 1);

    libusb_exit(NULL);

    return 0;
}

If you want to test it, download this file with the code above and compile it (make sure you have the libusb development files installed – e.g. under Ubuntu the package libusb-dev):

g++ -I/usr/include/libusb-1.0/ -c tl500.cpp -otl500.o
g++ -otl500 tl500.o -lusb-1.0

If you run the executable tl500, it should find the TL-500 and will request data from it:

Trying to find Arexx logging system.
1d6b:0002 (bus 1, device 1)
1d6b:0002 (bus 2, device 1)
1d6b:0001 (bus 3, device 1)
1d6b:0001 (bus 4, device 1)
0451:3211 (bus 4, device 3)
Found Arexx TL-500.
Data: 00 0a 07 48 05 b4 04 00 00 00 0b 00 ff 01 00 00 00 01 09 02 19 00 01 01 00 80 32 09 04 00 00 01 ff 00 00 00 07 05 01 02 40 00 00 36 21 00 00 f1 b5 ad 76 af 7d c1 f2 47 6a 7f bf ed 31 2e 3f 09
Data: 00 0a cc 22 0a e7 1e 00 00 00 07 00 ff 01 00 00 00 01 09 02 19 00 01 01 00 80 32 09 04 00 00 01 ff 00 00 00 07 05 01 02 40 00 00 36 21 00 00 f1 b5 ad 76 af 7d c1 f2 47 6a 7f bf ed 31 2e 3f 09
Data: 00 0a 06 48 17 fe 2b 00 00 00 14 00 ff 01 00 00 00 01 09 02 19 00 01 01 00 80 32 09 04 00 00 01 ff 00 00 00 07 05 01 02 40 00 00 36 21 00 00 f1 b5 ad 76 af 7d c1 f2 47 6a 7f bf ed 31 2e 3f 09
Data: 00 0a db 45 05 67 3b 00 00 00 0c 0a 72 22 0b 07 3b 00 00 00 0a 00 01 01 00 80 32 09 04 00 00 01 ff 00 00 00 07 05 01 02 40 00 00 36 21 00 00 f1 b5 ad 76 af 7d c1 f2 47 6a 7f bf ed 31 2e 3f 09
Data: 00 0a cc 22 0a e7 50 00 00 00 07 00 ff 22 0b 07 3b 00 00 00 0a 00 01 01 00 80 32 09 04 00 00 01 ff 00 00 00 07 05 01 02 40 00 00 36 21 00 00 f1 b5 ad 76 af 7d c1 f2 47 6a 7f bf ed 31 2e 3f 09

How to decipher these data rows to meaningful logging data will be a blog post on its own.

Because it is autumn

autumn

autumn

library(colorspace)
branchvar <- 1
tree <- function(x, y, branch) {
  lines(x,y,col=hex(RGB(0.1,0.4,0)))
  wadd <- 0.7
  if(branch>0) {
    alpha <- atan2((y[2]-y[1]),(x[2]-x[1]))
    len <- sqrt((y[2]-y[1])^2+(x[2]-x[1])^2)*0.6
    tree(c(x[2],x[2]+abs(rnorm(1,1,branchvar))*len*cos(alpha)),
          c(y[2],y[2]+abs(rnorm(1,1,branchvar))*len*sin(alpha)),
          branch-1)
    tree(c(x[2],x[2]+len*cos(alpha+wadd)),
          c(y[2],y[2]+len*sin(alpha+wadd)),branch-1)
    tree(c(x[2],x[2]+len*cos(alpha-wadd)),
          c(y[2],y[2]+len*sin(alpha-wadd)),branch-1)    
  } else {
    points(x[2],y[2],pch=16,
             col=hex(RGB(rbeta(1,2.6,0.5),rbeta(1,0.5,3),0)))
  }
}

plot.new()
tree(c(0,0.20),c(0,0.20),7)

You need library colorspace.

Shortcuts in R under Unix from the readline library

Under Unix you can use in R the the advanced features for command editing and command history that the GNU Readline Library provides.

Both Emacs and vi editing modes are available and the Emacs-like keybindings are installed by default. Here some Emacs keybindings that also work in R (from the documentation of the GNU Readline Library):

Copy and Paste:
Ctrl-u Cut from the cursor to the beginning of the line.
Ctrl-k Cut from the cursor to the end of the line.
Ctrl-w Cut from the cursor to the start of the word.
Ctrl-y Pastes text from the clipboard.
Alt-y Rotate the kill-ring, and paste the new top. You can only do this if the prior command was Ctrl-y or Alt-y.

Moving:
Ctrl-a Move to the start of the line.
Ctrl-e Move to the end of the line.
Alt-b Move back one word.
Alt-f Move forward one word.
Ctrl-b Move back one character.
Ctrl-f Move forward one character.

Undo:
Ctrl-_ Undo the last changes.
Alt-r Undo all changes to the line.

Miscellaneous:
Ctrl-l Clear the screen leaving the current line at the top of the screen.

History:
Ctrl-r Incremental reverse search of history.
Alt-p Non-incremental reverse search of history.

If you want to use the vi-mode just press Ctrl+Alt+j and you can use the usual vi modes and commands (for example take a look at this vi-editing-mode-cheat-sheet).

If you want to start in the vi mode by default, put the following line in your ~/.inputrc (which of course will also effect your bash etc.):

set editing-mode vi

OpenOffice advanced fill down

When people come to me for statistical advise, they often bring Excel files containing data like the following, where values are only written down when they change. But often it is required for further analysis that each row is completely filled with the correct values. There is a basic “fill down” option in OpenOffice (CTRL+D), but up to my knowledge there is unfortunately no extension for this case.

filldown

Years ago I wrote the following lines in Visual Basic for a small company I was visiting that even employed a student assistant for filling out these missing values in enormous automatically generated Excel files (I did this in just a few minutes and of course free of charge, that’s why it is so basic – I just couldn’t see people doing such dreary tasks):

Sub MakroAdvancedFillDown()
  For Each cell In Range("h1:h13215")
    If cell.Value = " " Or cell.Value = "" Then
      cell.Value = old
    End If
    old = cell.Value
  Next cell
End Sub

And given that last week someone with a really large table with the same problem came into my office I took an hour to learn StarOffice Basic and to write the following Macro:

SUB AdvancedFillDown
  oSel = ThisComponent.getCurrentSelection()
  oAdr = oSel.rangeAddress
  oSheet = ThisComponent.CurrentController.ActiveSheet  
  IF NOT oSel.supportsService("com.sun.star.sheet.SheetCellRange") then
      MsgBox "Error: Please select some cells for this makro."
      EXIT SUB
  END IF  
     
  FOR j = oAdr.startColumn TO oAdr.EndColumn
    FOR i = oAdr.startRow TO oAdr.EndRow
      oCell=oSheet.getCellByPosition(j,i)
      IF oCell.string = "" THEN
        oSource = oSheet.getCellRangeByPosition(j,i-1,j,i-1)
        oSourceAdr = oSource.getRangeAddress
        oSheet.copyRange(oCell.getCellAddress,oSourceAdr)
      END IF
    NEXT
  NEXT
END SUB

Now I can deal with this problem with just one click.

Random Correlation Matrices

Some time ago one of my colleagues with a new method for evaluating the cumulative distribution function of a multivariate normal distribution wanted to compare the speed of his method with that of randomized quasi-Monte Carlo methods. While we were going to lunch, he asked me how to generate random correlation matrices, because the speed of his method depends strongly on the correlation matrix and he wanted to have some sort of average.

But what is a random correlation matrix?

Let’s first give a characterization of correlation matrices.

It is well known that for a matrix $C:=(c_{i,j})_{1leq i,jleq n}inmathbb{R}^{ntimes n}$ there exist (multivariate normal distributed) random variables $X,Y$ with [text{Cor}(X,Y):=left(frac{text{Cov}(X_i,Y_j)}{sqrt{text{Var}(X_i)text{Var}(Y_j)}}right)_{1leq i,jleq n}=C] if and only if

  1. $-1leq c_{i,j}leq 1$ for all $i,jin{0,ldots,n}$,
  2. $c_{i,i}= 1$ for all $iin{0,ldots,n}$,
  3. $C$ is symmetric (therefore all eigenvalues $lambda_1,ldots,lambda_n$ of $C$ are real)
  4. and all eigenvalues of $C$ are greater or equal to zero.

But what is the right notion of randomness for these matrices?

For example let’s look at the orthogonal matrices. In many numerical applications we need uniformly distributed random orthogonal matrices in terms of the Haar measure (See http://en.wikipedia.org/wiki/Orthogonal_matrix#Randomization).

Unfortunately in our case there is no clear, natural notion of randomness. 🙁

Method 1 – Try and Error: We generate a matrix fulfilling no. 1, 2 and 3 of the characterization (these matrices are called pseudo correlation matrices) by generating independent pseudo-random numbers uniformly distributed between -1 and 1 for the entries $c_{i,j}=c_{j,i}$, $1leq i

If this random symmetric matrix is positive semidefinite (i.e. all eigenvalues of $C$ are greater or equal to zero) than we have the desired result. Otherwise we try again. Here is the corresponding R code:

random.pseudo.correlation.matrix &lt;-

function(n) {

a for(i in 1:(n-1)) {

for(j in (i+1):n) {

a[i,j] }

}

return(a)

}

random.correlation.matrix.try.and.error &lt;-

function(n) {

repeat {

a if (min(eigen(a)$values)&gt;=0) return(a)

}

}

This approach is only reasonable for very small dimensions (try it with $n=6,7,8$).

Method 2 – Lift the Diagonal:

We denote by $I$ the identity matrix. If $C$ has the eigenvalues $lambda_1leqldotsleqlambda_n$ then $(C+acdot I)$ has the eigenvalues $lambda_1+aleqldotsleqlambda_n+a$ since $x$ is a solution of $det(C-xcdot I)=0$ if and only if $x+a$ is a solution of $det(C+acdot I-xcdot I)=det(C-(x-a)cdot I)=0$.

So we start again with a pseudo correlation matrix $C$, but instead of retrying when $C$ has negative eigen values, we lift the diagonal by $lambda_1$ and obtain $C+lambda_1cdot I$, which is always positive semidefinite. After dividing by $1+lambda_1$ we have a correlation matrix which is “some kind of random”. 😉

Unfortunatly the diagonal is accentuated and the smallest eigen value is always zero. We could avoid the second problem by adding $lambda_1+b$ where $b$ is some random number, but the first remains.

make.positive.semi.definite &lt;-

function(a, offset=0) {

(a + (diag(dim(a)[1]) * (abs(min(eigen(a)$values))+offset))) /

(1+(abs(min(eigen(a)$values))+offset))

}
random.correlation.matrix.lift.diagonal &lt;-

function(n, offset=0) {

a make.positive.semi.definite(offset)

}

Method 3 – Gramian matrix – my favorite: Holmes [1] discusses two principal methods for generating random correlation matrices.

One of them is to generate $n$ independent pseudo-random vectors $t_1,ldots,t_n$ distributed uniformly on the unit sphere $S^{n-1}$ in $mathbb{R}^n$ and to use the Gram matrix $T^tT$, where $T:=(t_1,ldots,t_n)$ has $t_i$ as $i$-th column and $T^t$ is the transpose of $T$.

To create the $t_i$ in R, we load the package mvtnorm, generate $tau_isimmathcal{N}(0,I)$ and set $t_i:=tau_i/||tau_i||$:

random.correlation.matrix.sphere &lt;-

function(n) {

require("mvtnorm")

t for (i in 1:n) {

t[i,] }

t%*%t(t)

}

Conclusion: There are futher methods (like e.g. to generate a random spectrum and then construct the correlation matrix), which are not all so easy to implement. But as much as the three given methods they all are unsatisfactory in some way because we don’t really know how random correlation matrices should be distributed.

For my colleague an average of calculation time does only make sense when he knows which kinds of correlation matrices occur in the applications. He decided to describe and compare the different cases individually.

But does it perhaps make sense to use random correlation matrices as test cases or are the special cases more important? For example random correlation matrices generated with method 1 and 3 are only singular with probability zero.

Any critique, comments, suggestions or questions are welcome!

And for the next time: Given a correlation matrix C. How do we generate tuples of pseudo-random numbers following a given multivariate distribution with correlation matrix C?

Literature:

Holmes, R. B. 1991.

On random correlation matrices.

Siam J. Matrix Anal. Appl., Vol. 12 No. 2: 239-272.

Executable application launcher

Under Linux we are creating launchers for our applications according to the Freedesktop Desktop Entry Specification. With recent versions of Gnome or KDE we get the following launcher:

Untrusted application launcher

Untrusted application launcher

When the launcher is clicked, it is asked whether the application should be trusted.

Why? Because recently the desktop environment developers became aware of the following problem:

Some browser like Firefox are saving files to the desktop. When someone is now providing a link to a file with a .desktop extension, it would appear as a desktop launcher on the desktop. Without any problem it could look like a arbitrary starter e.g. for Firefox (provided the according image like /usr/share/pixmaps/firefox-3.0.png can be correctly guessed) but do something completely different and possibly malignant.

Therefore in recent version of GNOME and KDE the desktop launchers need to have execute permissions (which is not preserved for downloaded files). Then they look and act again as normal:

Executable application launcher

Executable application launcher

So don’t forget to give your desktop launchers execute permissions! 🙂