Jochen's Blog

By Jochen Voss, published , updated


I am happy to announce the release of fast-dm version 30.2. This release fixes some memory leaks found by Valgrind. You can download the program from the Fast-dm homepage.


A new paper, together with my PhD student Mark Webster and my colleague Stuart Barber, has been published today!

This paper took a long time to be published, but I think it is worth the wait.


I am very happy to announce that a very old (nearly forgotten?) article has finally appeared. We chose the journal because we had heard the rumour that they have a fast turnaround, but this plan didn't work out: the article was submitted 23 September 2011 and accepted on 21 February 2012. But then came a two year wait, and the article only has appeared just now:


Do you want to do a PhD on Bayesian Uncertainty Quantification with me? Are you good with maths and computers? As part of the new, NERC funded, doctoral training centre I am advertising for a PhD position. Details can be found on the DTP web page, the deadline for applications is 24 January 2014, start of the PhD is October 2014.


I am very happy to report that after a long struggle we have finally finished (and submitted) a new paper today:

The paper studies the convergence of ABC estimates to the correct value. In particular,

  • we prove that the method converges under very weak conditions,
  • we show that the tolerance parameter δ should be chosen proportional to n-1/4, where n is the number of accepted ABC samples, and
  • we determine the speed of convergence under optimal choice of δ.

Hopefully, these results prove useful for others, too.


I just tried to find out what the default random number generator of the Go programming language is. The source code for this RNG can be found in the file rng.go; the only reference in the source code is the cryptic comment algorithm by DP Mitchell and JA Reeds. Does anybody have a reference for this, or can at least identify DP Mitchell or JA Reeds? Some things I tried:

  • I found one reference which claims that the Go random number generator is the same as used in Plan 9. The participants of this discussion seem to not know the origin of the method used.
  • While searching for Mitchell in the context of random number generation, I found references to the following article: D. Mitchell, Nonlinear Key Generators, Cryptologia, vol. 14, pp. 350-354, 1990. I hoped that this article may allow me to find an affiliation for D. Mitchell, but unfortunately it seems to have been removed(!) from the journal. The table of contents shows a gap between pages 349 and 355.
  • A Google search for mitchell reeds random number generator shows no obvious good candidates.

Any hints about where the algorithm comes from would be most welcome.


Hooray!!!

I am very happy to report that my book has finally appeared. While the book officially is out for a few days already, it was only very recently that it appeared on Amazon. You can get your copy by clicking on the thingy below (if it shows; otherwise by clicking here).

The book is written as a text-book for postgraduate students and covers basic topics around the area of Monte Carlo methods. Some more information can be found on the publisher's web page.

This is quite exciting for me. It's a book!!! I have a paper copy: you can touch it and all, it's really real!


Today I got an email from a Dr. R. K. Dixit (HON), bringing good news:

I came across to your research paper titled "Upper and Lower Bounds in Exponential Tauberian Theorems" and feel that your research is having a very good impact. With a view to begin a long-term fruitful association with you, I invite you to submit your upcoming research articles / papers for publication in Global Journal of Science Frontier Research (GJSFR), an international double blind peer reviewed research journal.

It seems that I could be an expert in Frontier Research!!! The email refers to the following paper:

  • : Upper and Lower Bounds in Exponential Tauberian Theorems. Tbilisi Mathematical Journal, vol. 2, pp. 41–50, 2009.
    link, arXiv:0908.0642, more…

While this is a nice paper (it got much better during the refereeing process; had I known this in advance I would have submitted it to a better known journal), it is still a relief to hear that the paper has very good impact. Google currently doesn't know of any citations of my paper.

As it happens, the publisher of GJSFR, Global Journals Inc. (US), is also listed on Beall's list of potential, possible, or probable predatory scholarly open-access publishers.


A while ago, the computer game Infinity Blade 2 was given away for free as part of some advertisement campaign and I got myself a copy at the time.

One of the features in the game is a "gem forge", where one can put in three gems and, after a few hours, gets a different gem in return. Gems have both financial value (gems can be sold) and practical value (gems can be used to enhance fighting skills and equipment). I searched for advice about how to best use the gem forge, but found only vague rumours and incomprehensible to me lists.

To understand the gem forge better, I started taking notes about which output I got for which combination of inputs. The procedure is a bit slow, so I have only 16 data points so far, but I got a few results:

  • I found two cases where the information displayed for the input gems was identical, but which resulted in different output gems. Thus, either randomness is involved, or the gems have some hidden state which is not displayed in the gem screen.
  • For all 16 cases I have data for, the time to forge the new gem (in hours wall clock time) can be found by taking the price of the most expensive input gem, dividing by 15000, adding 0.25, and rounding to the nearest 30 minutes. (I found this rule using linear regression.) Let's see whether this rule allows to successfully predict the forging time for future gems.
  • In all cases I observed, the sale price of the output gem was higher than the sale price of the most expensive of the input gems.
  • The correlation between the sale prices of the output gem and of the most expensive input gem is higher than the correlation between output price and the sum of the input prices.

I haven't looked at other properties (shape, colour, function, etc.) of the gems yet.

If anybody knows more about the gem forge, or if there is a good reference available on the internet, I'd be happy to learn about this.


During the last days I finished an implementation of the Fortuna random number generator in the Go programming language. The result can be found on the new Fortuna home page and on github.com.

This is very different from the random number generators I usually deal with: Normally I am interested in random number generators for use in Monte Carlo simulations; this is, for example, the topic of my upcoming book (click the image below for more information). In contrast, the Fortuna random number generator is made for use in cryptographic algorithms; for example it could be used to generate session tokens for a web app.


I have just published a small package for the excellent Go programming language: The new package, trace, implements a simple tracing framework. Code using this framework can emit diagnostic messages using the trace.T() function. In normal operation, calls to trace.T() have no effect and are very fast. When tracing is enabled (in order to track down a problem or to explore the inner workings of a program), listeners can record/display the trace messages.

Details can be found on the tracke package home page. Comments are very welcome!


I am very happy to report that I have finished writing my first-ever book! The title is An Introduction to Statistical Computing and the book will be published by Wiley. This book is a textbook, mostly for postgraduate students, and it covers random number generation, Monte Carlo methods, and a few more advanced topics.


Today I submitted a new paper. Preparing this took forever, but it's finally out:


[2013]


The Police and Crime Commissioner elections will take place on Thursday. The question is whom to vote for.

Some thoughts:

  • To me it seems that the persons standing for election (mostly local councillors) may not be the right persons to supervise the police. Wouldn't this be a much better job for the independent police complaints commission or the chair of the police authority or similar? So I wish there was a none of the above option at the end of the list of candidates.
  • I will be able to choose between The Labour Party Candidate, The Conservative Party Candidate, Independent and Liberal Democrats. I feel a bit uneasy about how this seems to go against the principle of separation of powers.

The candidates for Leeds are the following (summarised from the choose my PCC web page):

  • Mark Burns-Williamson (labour).
    Until recently, he was the chair of the West Yorkshire police authority, which probably makes him qualified for the job as a PCC. He optimistically writes that he will stand against the £100m Government cuts of around 2000 police officers and staff in West Yorkshire, but I would be surprised if he succeeded in this.
  • Geraldine Mary Carter (conservative).
    Geraldine has served on West Yorkshire Police Authority, whatever this means. She writes there are communities that the police feel unable to tackle crime in because they fear the consequences, and I wish she would tell us which communities she means by this (is this about gang crime? Muslims? Corrupt politicians?). Also, she writes victims should be able to decide the community punishment of offenders, possibly indicating that she favours revenge over rehabilitation? She will recruit 850 new Community Special Constables, but doesn't say how this will be funded. She uses all the conservative buzz-words from zero-tolerance to unnecessary red tape.
  • Cedric Mark Christie (independent). Cedric seems to be a retired police officer. He writes I will chase the corrupt and incompetent out of office and I know the difference between right and wrong. I never allowed negative traits to influence my behaviour whilst serving you. He seems VERY focussed on bullying.
  • Andrew Clive Glover Marchington (libdem).
    Among the four candidates, Andrew seems to have the least connection to the police. He uses the phrases reduce red tape, payback sentencing, but also reduce reoffending. He seems to have a stronger focus on domestic violence than the other candidates do.


For (my own) future reference, here are example commands to create a 3D (surface) plot in R:

library("rgl")
open3d(windowRect=c(50,50,800,800))

x <- seq(-10, 10, length=20)
y <- seq(-10, 10, length=20)
z <- outer(x,y, function(x,y) dnorm(x, 2, 3)*dnorm(y, 3, 7))

palette <- colorRampPalette(c("blue", "green", "yellow", "red"))
col.table <- palette(256)
col.ind <- cut(z, 256)
persp3d(x, y, z, col=col.table[col.ind])

These commands open the plot in a new window. The windowRect parameter in the call to open3d determines the initial position and size of the window (many thanks to Jem Corcoran for pointing this out). The output can be rotated and scaled using the mouse.

[a 3D plot, generated with R]


Following up on my recent blog entry about generating publication quality figures with R, here are some more hints, this time about setting the figure margins.

An R figure consists of several regions. From the outside in, these are as follows:

  • The device region. This region contains the figure region, surrounded by the outer margin. The device region may also contain text (for titles etc.). The size of this region is determined by the width and height arguments of the pdf command.
  • The figure region contains the plot region, surrounded by the inner margin. The axis tick marks and labels are part of the figure region. The size of the figure region is determined by the size of the device region and the width of the outer margin.
  • The plot region contains the plot of the data, not including any labels and tick marks. The size of the plot region is normally determined by the size of the figure region and the width of the inner margin.

[figure regions in R]

figure 1. The different regions of a figure generated in R

These different regions are illustrated in figure 1, above. The figure was created using the following R code:

par(oma=c(2, 2, 2, 2), mar=c(4, 4, 2, 2))

set.seed(1)
plot(rnorm(10), xlab="x label", ylab="y label")

box("outer", col="red", lwd=2)
mtext("device region", side=3, line=1, col="red", outer=TRUE)

box("inner", col="green")
mtext("figure region", side=3, line=1, col="green")

box("plot", col="blue")
text(5, 0, "plot region", col="blue")

A plot for use in a publication will normally need to make efficient use of the available space. For this purpose, the default margins used by R are too generous. This is illustrated in figure 2.

[default plot margins for R plots]

figure 2. A simple R plot, using the default margin settings. The red box was added to show the total area covered by the figure. There are unnecessarily large margins on the top and the right of the figure.

To maximise the area available for displaying information, unnecessary white space around the plot should be avoided. To achieve this, the following suggestions may be useful:

  • Outer margins. A figure in a publication will normally have a legend to contain all accompanying information. Thus, no title etc. will be required for the figure. Thus, normally the outer margin size will be 0:
    par(oma=c(0, 0, 0, 0))
    
  • Inner margins. The inner margins need as small as possible, but large enough to contain the axis labels and tick marks. For the plot in figure 2, we can use the following margins:
    par(mai=c(0.85, 0.85, 0.1, 0.1))
    

    Here we use mai (margin in inches) instead of mar (margin in lines), to have better control over size of margins. The four numbers give the widths of the margins at the bottom, left, top and right in this order. Since the plot has no labels on the top and on the right, the last two numbers can be chosen close to 0.

A version of the above plot, with the margins adjusted as shown above, is displayed in figure 3:

[R plot with adjusted margins]

figure 3. The plot from figure 2, but with the margin sizes adjusted to maximise the area available for displaying information. This figure will take up the same amount of space as figure 2 in the enclosing document, but provides much more space for displaying the plot data.

References:


I am happy to notice that one of my papers has been published, after a bit of a wait:

If you are interested in numerical solution of stochastics partial differential equations, this may be of interest. Comments about the article are very welcome.


For one of my programming experiments, I want to compute timestamps by using the number of seconds since the beginning of 1970. The resulting value will be used for communication between a web browser and my web server. My aim is to find a method to get the very similar values for the timestamp in JavaScript code running on the web browser and in a Python program running on my web server.

To solve this problem, I need a method which does not depend on the time zone the web user is in, and which does not break when the daylight saving time (DST) starts or ends. One solution to this problem is, to perform the computation of the timestamp in Coordinated Universal Time (UTC, a successor of the Greenwich Mean Time). For reference, this post describes what I found.

In the web browser, I am using the following JavaScript code:

  now = new Date()
  timestamp = now.getTime() / 1000

Here, the getTime method does all the work; since JavaScript timestamps use milliseconds since the beginning of 1970, we have to divide by 1000 to convert to seconds. According to the ECMAScript specification (also governing JavaScript), the resulting value should be independent of time zone and DST. For testing, I changed the clock of my computer to a different time zone; everything still worked: after restarting the browser, now reflects the new time zone, but timestamp is only changed by the few seconds I needed to restart the browser.

The code on the web server is written in Python. After a lot of experimenting, I've found two working combinations: The more straightforeward way of computing a timestamp is as follows.

  import datetime
  import time

  now = datetime.datetime.now()
  timestamp = time.mktime(now.timetuple()) + now.microsecond * 1e-6

Despite the fact that there is no explicit conversion to UTC in this code, the computed timestamp gives the number of seconds since the start of 1970 in UTC. This depends on time.mktime's ability to magically know the current time zone of the computer this code is running on. The same computation can also be done using times in UTC directly.

  import datetime
  import calendar

  now = datetime.datetime.utcnow()
  timestamp = calendar.timegm(now.utctimetuple()) + now.microsecond * 1e-6

This time, the code depends on the utcnow function knowing the current timezone. An ugly point of the second solution is, that it requires the slightly obscure calendar module from the Python standard library. Again, I tested this code by changing the time zone of the computer's clock and all worked as expected.


I just learned a new trick: the "texdoc" program automatically finds and displays information for LaTeX packages. On my shiny Mac, using the MacTeX distribution, I can type commands like the following:

texdoc geometry
texdoc tikz
texdoc jvlisting

This shows the documentation for the corresponding package as a PDF file.


I am currently in the process of writing a book, and I plan to produce several figures for this project using the statistical computing package R. While R is generally quite good at creating scientific graphics, some adjustments help to create high-quality images for inclusion in another document. This blog entry gives some simple hints about the required steps.

  • Including figures in LaTeX: My aim is to create figures which can be included in a LaTeX document as follows:
    \documentclass{article}
    \usepackage{graphicx}
    \begin{document}
    
    Some useful information is shown in figure~\ref{figure1}.
    
    \begin{figure}
      \begin{center}
        \includegraphics{figure1}
      \end{center}
      \caption{\label{figure1}This is a figure.  It shows useful information.}
    \end{figure}
    
    \end{document}
    
  • PDF output: The figures need to be generated in one of the image formats supported by LaTeX. To get high-quality output it is best to avoid raster-based formats like JPG or PNG. Since I use pdfLaTeX, the most convenient file format for my figures is PDF. R can directly generate a PDF file from a script: This can be achieved with the pdf() command. The required calls are as follows:
    pdf("figure1.pdf", width=4.6, height=3)
    ...commands to generate the plot
    invisible(dev.off())
    

    The options width and height give the outer dimensions of the plot (including labels and margins) in inches. The dev.off() command at the end is required to make sure all output is written to the PDF file. I enclose dev.off inside invisible() in order to suppress an annoying message about the "null device" being printed to the screen.

  • Reproducability: for scientific use, it is important that the plot can be easily recreated at a later time. For this reason, I find it useful to create a separate R source file for every plot. For example, the file figure1.R might contain the R code required to generate figure1.pdf. On Linux and MacOS we can use a shebang-line to turn the R-script into a stand-alone program. This requires two steps: First, we have to add a line like the following at the start of the R script:
    #! /usr/bin/env Rscript
    pdf("figure1.pdf", width=4.6, height=3)
    ...commands to generate the plot
    invisible(dev.off())
    

    And secondly, we need to make the script executable by using (in a terminal window) a command like chmod +x figure1.R. After these steps, figure1.R can simply be executed to regenerate figure1.pdf. This particularly useful for use in Makefiles to automatically regenerate the figure every time the R script is changed.


I am happy to announce release 0.6.2 of Wisent, the Python parser generator. This release fixes a few minor issues which have been found in the previous release of Wisent. The source code of Wisent 0.6.2 can be downloaded from the Wisent homepage or from github.com.

[cave painting of a wisent]


I am happy to announce release 0.2 of the jvqplot data plotting program. One of the main features of jvqplot is that the plot is automatically updated whenever the data in the input file changes.

Changes in this release:

  • Empty lines in the input file can now be used to separate datasets. This allows to create plots with disconnected components as in the picture below.
  • The program source code has been cleaned up and split into several files.
  • Some minor (mostly cosmetic) issues with the previous release have been fixed.

The new jvqplot release can be downloaded from the jvqplot homepage.

[jvqplot demo with squares]


This weekend, I finished a new program: JvJsDoc, a program to extract documentation from JavaScript source code and to present the collected information in a set of HTML pages. It is meant to be used together with the Google Closure Library and the Google Closure Compiler. As an example: JvJsDoc can convert the JavaScript source file goog/disposable/disposable.js into the HTML documentation in goog/Disposable.html.

More information can be found on the JvJsDoc homepage, and you can download the program here:

jvjsdoc version 0.5, 2011-12-11

First public release.


After taking much longer than I intended to, today I finally submitted a new paper (about a year later than planned). You can have a look at the preprint here:


jvlisting is a LaTeX package which provides an alternative to LaTeX's built-in verbatim environment. The new release, version 0.5, fixes some bugs and cleans up the TeX source code a bit.

I hope you find this package useful. Bug reports and comments are very welcome!


Last week, Kanti Mardia and I submitted a new paper:

I am particularly curious about how the referees will react to my nice figures on pages 6 and 7.


I have typed my lectures notes for the statistical computing module I am teaching this term. The notes are available online:

  • cite{Vo11}

Comments are very welcome!


I just started playing with Google+, not sure yet whether this will be useful or interesting to me.


Just for reference, here is a recipe about how to watch your DVDs on an Apple iPad. I have read that some of the required steps may be illegal to perform when you are in the USA (or an American citizen?); don't follow these instructions if you are not allowed to!

The instructions provided here use MPlayer/MEncoder on Ubuntu Linux to convert the DVD content into a file of the required format, but many other options are available.

  • Use lsdvd (contained in the Ubuntu package lsdvd) to find out which track of the DVD contains the actual movie. Normally this will be the longest track, lsdvd helpfully lists the index of this track at the bottom of its output.
    lsdvd
    
  • Use MPlayer (Ubuntu package mplayer) to verify that this is indeed the track you want:
    TRACKNO=1
    mplayer dvd://$TRACKNO
    

    You have to replace the value of TRACKNO with the track number found in the previous step.

  • Use mplayer to determine any cropping reqired:
    mplayer dvd://$TRACKNO -vf cropdetect
    

    This should print an option of the form -vf crop=720:430:0:48 to the console.

  • Use MEncoder to convert the data on the DVD to something an iPad will play. For the method described here, you need a version of mencoder with faac support included. One such version can be found in the Medibuntu repository.

    The call to mencoder requires about a million arguments. The following is inspired by a post at Mike McCandless' blog.

    CROP=720:430:0:48
    
    X264OPTS=crf=28:vbv_maxrate=1500:nocabac:global_header\
    :frameref=3:threads=auto:bframes=0:subq=6:mixed-refs=0\
    :weightb=0:8x8dct=1:me=umh:partitions=all:qp_step=4\
    :qcomp=0.7:trellis=1:direct_pred=auto
    
    FAACOPTS=br=160:mpeg=4:object=2:raw
    
    mencoder dvd://$TRACKNO \
      -of lavf -lavfopts format=mp4 \
      -vf crop=$CROP,pp=lb \
      -ovc x264 -x264encopts $X264OPTS \
      -oac faac -faacopts $FAACOPTS -channels 2 -srate 48000 \
      -o out.mp4
    

    You should replace the value of CROP with the value found in the previous step. This command can take a considerable amount of time to complete (comparable to the playing time of the DVD track), and should produce a big file out.mp4 (about 225MB per hour of playing time).

  • Copy the resulting file to a computer running Apple iTunes and add the file to your iTunes library. Finally, sync with your iPad.


A while ago I finished my first home-made LaTeX package. The new package is called "jvlisting" and provides a replacement for LaTeX's verbatim environment.

This package provides the LaTeX environment listing, an alternative to the built-in verbatim environment. The listing environment is specially taylored for including listings of computer program source code into documents. The main advantages over the original verbatim environment are that

  • The listing environment automatically fixes leading whitespace so that the environment and program listing can be indented with the rest of the document source, and
  • listing environments may easily be customised and extended.

You can download the package and its documentation from my webserver or from the jvlisting page at the Common TeX Archive Network (CTAN). I'd be happy if you could give my package a try. Comments about jvlisting are very welcome!


Recently I spent some time to understand how one can execute a macro repeatedly, once for every line of text in a LaTeX environment. Since the solution is a bit tricky and I found it diffcult to find answers on the web, here is a summary of what I learned.

Step 1. In order to prevent input lines from being concatenated by TeX before we get access to them, we can use the \obeylines macro. This allows to define a macro which matches everything until the end of line:

\def\doline#1{line found: `#1'\par}
{\obeylines
\gdef\getline#1
  {\doline{#1}}}

{\obeylines
\getline This is the \textbf{first} line.
This is the second line.}

Note that \obeylines must be in effect both while we define the \getline macro and while scanning the text. The outer pairs of enclosing brackets are there to contain the effect of \obeylines. To make \getline visible outside these brackets we use \gdef to define the macro in the global namespace. The output of the above TeX code is

line found: ‘This is the first line.’
This is the second line.

Step 2. We would like to have the \getline macro called for every line instead of just for the first one. This can be achieved by putting a \getline (without arguments) at the end of the \getline replacement text. The only complication is that we somehow need to stop this procedure once the end of the region of interest has been reached:

\def\marker{END}
{\obeylines
\gdef\getlines#1
  {\def\text{#1}%
  \ifx\text\marker \let\next\empty
    \else \doline{#1}\let\next\getlines \fi
  \next}}

{\obeylines
\getlines This is the \textbf{first} line.
This is the second line.
END
}

The \ifx command is used to test whether the matched line is the same as \marker. Until we find the maker, we put a new call to \getlines at the end of the \getlines replacement text, thereby looping over all lines until the marker is found. For this to work, the END needs to occur on a line on its own; therefore we have to move the closing brackets down one line. The TeX code above leads to the following output.

line found: ‘This is the first line.’
line found: ‘This is the second line.’

Step 3. We are now ready to pack the commands from above into the definition of a LaTeX environment:

\def\marker{\end{dolines}}
{\obeylines
\gdef\getlines#1
  {\def\text{#1}%
  \ifx\text\marker \let\next\text
    \else \doline{#1}\let\next\getlines \fi
  \next}}
\newenvironment{dolines}{\begingroup\obeylines\getlines}%
  {\endgroup}

\begin{dolines}
This is the \textbf{first} line.
This is the second line.
\end{dolines}

Here we had to change the definition of \getlines in order to include the \end{dolines} in the replacement text when the \ifx is true; otherwise it would have been swallowed by \dolines. The resulting output is, a bit surprisingly, as follows:

line found: ‘’
line found: ‘This is the first line.’
line found: ‘This is the second line.’

The extra empty line at the beginning is caused by the \getlines macro matching the (empty) text between \begin{dolines} and the end of line. The \end{listing} must be on a line on its own for this environment to work.

Step 4. If we want to prevent processing of the TeX commands inside the dolines environment, we can do so by changing the category codes of special characters like \ to 12 (other). A complication with this plan is, that \ifx also compares category codes. Therefore, when we try to detect the end of our environment while special characters are switched off, we need to define \marker as the string \end{dolines}, but with the category codes of all special charcters (the backslash and the curly brackets) set to 12. This can be done by using the following, contorted sequence of commands:

\begingroup
\catcode`|=0 \catcode`[=1 \catcode`]=2
\catcode`\{=12 \catcode`\}=12 \catcode`\\=12
|gdef|marker[\end{dolines}]
|endgroup

Two more changes are required to make this work: first, we need to disable all special characters at the beginning of the environment:

\let\do=\@makeother\dospecials

If you want to use this command outside a style file, you will need to turn @ into a letter by bracketing your definitions with

\makeatletter
...
\makeatother

Secondly, inside \getlines, the expansion of \text still has the category codes as found in the input. Therefore, if the input was read with special characters switched of, we cannot write \let\next\text to get the closing \end{dolines} (since the category codes of the backslash and the curly brackets would be wrong). Instead, we need to use a definition like \def\next{\end{dolines}}.

Example. Using the techniques explained above, we can define a replacement for the LaTeX comment environment (contained in the verbatim package) as follows:

\documentclass{article}

\makeatletter

\begingroup
\catcode`|=0 \catcode`[=1 \catcode`]=2
\catcode`\{=12 \catcode`\}=12 \catcode`\\=12
|gdef|marker[\end{comment}]
|endgroup

{\obeylines
\gdef\getlines#1
  {\def\text{#1}%
  \ifx\text\marker \def\next{\end{comment}}
    \else \let\next\getlines \fi
  \next}}
\newenvironment{comment}{\begingroup
    \let\do=\@makeother\dospecials \obeylines\getlines}%
  {\endgroup}

\makeatother

\begin{document}

\begin{comment}
  Everything here will be ignored.
  \Invalid LaTeX code and incomplete constructs like
  \begin{itemize}
  without the closing end are no problem.
\end{comment}

\end{document}


Funny coincidence: Yesterday I learned that the following paper which we first submitted in 2010 has been accepted:

And today I received an email notice that a paper we submitted in 2009(!) is finally going into print now:


We are currently advertising a new lectureship in the statistics department of the University of Leeds. Thus, if you are a PostDoc and want to become a collegue of me, have a look at the job advert. The closing date for applications is 31st March 2011.


I am happy to announce the release of Jvqplot version 0.1.

Jvqplot is a data plotting program, resembling a simplified version of gnuplot. It is very simple to use, but it can only plot simple data files. The format and scaling of the plot is automatically chosen and cannot be changed. One of the main features of jvqplot is that the plot is automatically updated whenever the data in the input file changes. More information can be found on the Jvqplot homepage.

jvqplot version 0.2, 2012-04-09

empty lines in data files separate data sets, clean up the program sources

jvqplot version 0.1, 2010-11-29 (experimental version)

first public release

This is the first public release of jvqplot. Any feedback about the program would be most welcome.

[jvqplot screenshot]


Since I don't watch TV programmes and also don't own a TV, I do not need a TV license. The TV licensing people seem to find this similarly difficult to believe than their German counterparts, so they keep hassling me about this.

Their latest letter seems to allow to declare online that I don't need a license. Unfortunatly, the web form quickly forces me to choose one of the following possibilities:

  • Only watch pre-recorded programmes (DVDs/Videos)
  • Only watch non live programmes via a Computer (e.g. Catch up services on iPlayer)
  • Only watch non live programmes via a TV (e.g. Catch up services on iPlayer)
  • Only use TV for games / consoles
  • Covered by another licence
  • I have no equipment

The last option seems like a possible choice for me, except for the fact that my computer probably qualifies as equipment. So which option am I meant to choose????


I just discovered the (1991) article What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. Very useful reading!


I am happy to announce the release of Wisent version 0.6.1. Wisent is a parser generator for Python. The new release fixes a problem (introduced in version 0.6) where comments and multi-line strings in grammar files were broken. You can download the program from the wisent homepage.


I am slightly embarrassed by the fact that I've been caught out by the birthday paradox yesterday. The encounter went as follows:

While testing a new random number generator by analysing a list of 1000 generated standard normal distributed random numbers, I discovered that the list contained one of the numbers twice!!! This is suspicious, because this event has probability 0 in theory. After an (unsuccessful) hunt for bugs in my program, I finally found the following explanation.

The program prints the numbers using a C command like

printf("%f\n", normal(0, 1));

By default, the %f format string outputs numbers with a precision of six significant digits:

-0.641062
1.116142
1.417036
0.337435
-0.310383
...

Most of the numbers will lay between -2 and 2, i.e. they are concentrated in a set of about 4 million possible values. A quick check reveals that 1000 independent uniform draws out of a set of this size contains a number twice with a probability of more than 10%, and for the normal distribution the probability will be even higher because the numbers are more concentrated around 0. Thus, seeing a number twice is something which will actually happen from time to time and is no indication that the program is malfunctioning!


Just now, my (not very fancy) phone showed the following corrupted message to me:

[phone screen]

I'm a bit at a loss about what is going on? If you have any idea, I would be happy to receive hints via email. Do I need to worry?


The program DSSP is used to determine the secondary structure of a protein, taking the three dimensional coordinates of its atoms as the input. Bio3d is a library for the statistical software package R which makes it easier to analyse protein structure; as part of this, bio3d contains an interface to DSSP (in the function dssp). Since I had a bit of trouble using this interface, here are some hints.

  1. Get the DSSP program from the DSSP webpage. For academic use, the program is free of charge, but you need to fill in a license agreement.
  2. Rename the executable to dssp (mine was called DSSP_MAC.EXE after I had downloaded it).
  3. Move the file to the directory where it will live. For my system (and for the examples below) I chose /usr/local/bin/. The name of this directory is not allowed to contain white space.
  4. Call the dssp function with this directory as the exepath argument. The value of the exepath argument must end in / (on Linux/Unix/MacOS) or \ (on Microsoft Windows). Also note that every \ in an R string constant needs to be doubled, e.g. on Microsoft Windows you will probably need to write something like exepath="C:\\path\\to\\file\\".

Example. In R one can now use the following commands.

library("bio3d")
pdb <- read.pdb("12as")
x <- dssp(pdb, exepath="/usr/local/bin/")

Then the secondary structure information of the protein 12AS can be accessed as follows:

> x$helix
$start
  1   2   3   4   5   6   7   8   9   1   2
  5  76 130 170 182 258 277 297 320 271 310

$end
  1   2   3   4   5   6   7   8   9   1   2
 27  83 155 176 193 268 283 305 325 275 312

$length
 1  2  3  4  5  6  7  8  9  1  2
23  8 26  7 12 11  7  9  6  5  3

$chain
 [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"

This tells us that the first helix reaches from residue 5 to residue 27 (both inclusive).


I learned about the different kinds of grains long ago, when I was in primary school. Since then I have somehow managed to confuse myself about which grain is which. Here's to unconfusion! (All information and pictures are from Wikipedia; follow the links for details.)

name picture comments
wheat / Weizen
(Triticum aestivum, Triticum durum, etc.)
[wheat] Used for bread, flour, couscous, beer, biofuel, …

Needs good soil and climate, several species of wheat are used, main source of vegetable protein in human food.

barley / Gerste
(Hordeum vulgare)
[barley] Used for beer, whisky, animal feeding, …

One of the first cultivated crops (together with einkorn and emmer).

oat / Hafer
(Avena sativa)
[oat] Used for animal feeding (e.g. horses), oat flakes, porridge, …

Can be grown in regions with relatively cold, wet summers.

rye / Roggen
(Secale cereale)
[rye] Used for bread, animal feeding, …

Can grow on relatively poor soil and in colder climates.


In the last days I have finished my first ever browser-based game: a puzzle, based on the classical fifteen puzzle. Cou can play the game here and download the source code from the program's homepage.

[webpuzzle screenshot]


[Jochen on the Cam]


Today, Martin and I finally finished our paper "Approximations to the Stochastic Burgers Equation". A preprint can be downloaded below, comments are very welcome.


It is of course well known that water fluoridation is a Communist conspiracy to sap and impurify all of our precious bodily fluids. Still, I am surprised to learn that one of the political parties running in the UK's upcoming general elections uses their opposition to water fluoridation as a campaign topic.

Can you guess which party this is? (for the answer click here)


Only one day after we submitted the revised version of our paper Sampling Conditioned Hypoelliptic Diffusions (see my previous blog post), the article was accepted! It will appear in the Annals of Applied Probability.


In order to address the referee's comments, we have prepared an updated version the following article:


I gave a talk at the Maths2010 conference in Edinburgh. Since I had only 15 minutes, the talk is nearly free of contents, but in compensation there are many nice pictures. I'm particularly fond of the torus on page 8 (which was surprisingly difficult to generate).

[3D torus]


This is a (some weeks old) photo of all twelve of me on Ilkley Moor.

[apostels]


[scrabble]


Yesterday I learned how to speed up the vector operation y := y + αx by using SSE2 instructions.

The function I was trying to speed up was as follows:

static void
daxpy0(int n, double a, const double *x, double *y)
{
  int i;
  for (i=0; i<n; ++i) {
    y[i] += a * x[i];
  }
}

Using SSE2 instructions, operating on two double precision numbers at a time, this can be written as follows:

#include <emmintrin.h>

static void
daxpy1(int n, double a, const double *x, double *y)
{
  __m128d aa, xx, yy;
  int i;

  aa = _mm_set1_pd(a);       /* store a in both registers */
  for (i=0; i+1<n; i+=2) {
    xx = _mm_load_pd(x+i);   /* load two elements from x ... */
    xx = _mm_mul_pd(aa, xx); /* ... and multiply both by a */
    yy = _mm_load_pd(y+i);   /* load two elements from y ... */
    yy = _mm_add_pd(yy, xx); /* ... and add */
    _mm_store_pd(y+i, yy)    /* write back both elements of y */
  }
  if (i < n) {
    y[i] += a * x[i];        /* handle last element if n is odd */
  }
}

This code required that the vectors x and y are 16-byte aligned (otherwise a segmentation fault will occur). This assumption holds, for example, for memory blocks allocated by malloc on 64bit Linux and MacOS X systems. Also, obviously, this only works on CPUs which support the SSE2 instruction set. A description of the SSE2 instructions used here can be found in the Intrinsics Reference of the Intel C++ Compiler Documentation (also seems to apply to the GNU C compiler; direct link here).

For comparison, I also tried to use the daxpy function provided by BLAS:

static void
daxpy2(int n, double a, const double *x, double *y)
{
  extern void daxpy_(int *Np, double *DAp,
                     const double *X, int *INCXp,
                     double *Y, int *INCYp);
  int inc = 1;
  daxpy_(&n, &a, x, &inc, y, &inc);
}

Details about this approach are described on my page about linear algebra packages on Linux.

Results. I tried the three functions using two vectors of length 2000. The following table gives the execution time for a million calls to daxpy (on a newish quad-core Linux machine):

method function time [s]
direct daxpy0 1.52
SSE2 daxpy1 0.91
BLAS daxpy2 2.47

As can be seen, the SSE2 code in daxpy1 is fastest, compared to the naive implementation daxpy0 it takes 40% off the execution time! For some reason, the BLAS version seems to be very slow; and the results on my MacOS machine are similar. Currently I have no explanation for this effect.


Recently I learned how to tunnel http traffic (e.g. web surfing) over an ssh connection. The effect of this is that you can browse the web on one computer A, say, but for the web servers you are visiting it will look like your requests originate from a different host B. You need to be able to log into host B via ssh for this to work.

There are several situations where such tunneling is useful:

  • If your own computer A is on a network which filters, changes, blocks or monitors your http traffic, you can use an ssh tunnel to work around the resulting restrictions.
  • If you are on an unencrypted wireless network, you can use tunnelling to prevent your neighbours from being able to read all of your web traffic.
  • If you work from home, you can use tunnelling to access online journals (which usually work only from university addresses), university internal web pages, etc. as if you were directly sitting at your work machine B.

Setting up a tunnel is done in two steps:

  1. Create a tunnel using the following command:
    ssh -D 8080 -f -q -N login@host
    

    You will need to replace login and host with your login details. The machine you type this command on is machine A in the description above, and host is machine B. This command will start an ssh process which will run in the background and will act as a tunnel to forward the web traffic.

  2. Set up your web browser to use a SOCKS web proxy on host localhost and port 8080. Both SOCKS4 and SOCKS5 should work. This will tell the web browser to connect to the local end of the ssh tunnel.


Over the weekend I finished a helper which allows to easily distribute a number of programs over the available CPUs on a multi-core system. You can find the program on the Parallel homepage.


I just found somebody's blog post stating 8 Reasons Normal People Should Juggle and I agree very much with his reasons.


Yesterday, I completed version 0.6 of my Python parser generator Wisent. The new version allows to use UTF-8 encoded grammar files (originally, only ASCII characters could be used in grammar files), adds a CSS parser as an additional example, and fixes some minor bugs. You can download the program from the Wisent homepage.


Yesterday I finished a revised version of my article about exponential Tauberian theorems. The main change is, that it transpired that my proof could be trivially modified to get a more general statement. Many thanks to the (anonymous) referee for pointing this out.

  • : Upper and Lower Bounds in Exponential Tauberian Theorems. Tbilisi Mathematical Journal, vol. 2, pp. 41–50, 2009.
    link, arXiv:0908.0642, more…


Today, one of our papers was accepted:

  • , and : Separating Response-Execution Bias from Decision Bias: Arguments for an Additional Parameter in Ratcliff's Diffusion Model. British Journal of Mathematical and Statistical Psychology, vol. 63, no. 3, pp. 539–555, 2010.
    DOI:10.1348/000711009X477581, preprint:pdf, more…


Just in case this is useful for anybody: here is an implementation of the HSV to RGB colorspace conversion in Python:

from __future__ import division

def hsv2rgb(h, s, v):
    hi = int(h//60 % 6)
    f = h/60 - h//60
    p = v * (1-s)
    q = v * (1-f*s)
    t = v * (1-(1-f)*s)
    return [ (v,t,p), (q,v,p), (p,v,t), (p,q,v), (t,p,v), (v,p,q) ][hi]

Update. In the meantime I learned that the Python standard library has a built-in version of this function in the colorsys module.


Today, I finally submitted the SPDE-paper I wrote with Martin Hairer and Andrew Stuart. This paper took a long time to complete (and it changed from an applied maths paper into a pure maths paper in the process). You can download a pre-print here; comments are very welcome:


A while ago, Andreas and I submitted another paper. Here is a pre-print:

    , and : Separating Response-Execution Bias from Decision Bias: Arguments for an Additional Parameter in Ratcliff's Diffusion Model. British Journal of Mathematical and Statistical Psychology, vol. 63, no. 3, pp. 539–555, 2010.
    DOI:10.1348/000711009X477581, preprint:pdf, more…


I just found out that Amazon sells Uranium ore. Not sure whether I should get some …


Over the week-end I completed the work on release 0.5 of my Python parser generator Wisent. You can now download the program from the Wisent homepage. The main news is the addition of a users' manual.


Today I discovered by chance the quirksmode.org page. Somebody there has spent lots and lots of time to work out the differences between different web-browsers, in particular in CSS and JavaScript support. Very useful!!!


Some days ago I found this strange story about two men trying to smuggle bonds worth 134 billion US dollars across the Italian-Swiss border. Or maybe it was fake bonds? Everything about this case seems quite mysterious to me.

Media coverage is strangly spotty, but summaries are published by the online editions of the Times and the German newspaper Die Welt (which I don't normally read). Of course there are also lots of conspiracy theories.

Update: It transpires that the bonds were fakes.


A sentence from the BBC web page:

Although passengers survived a landing on the Hudson River in New York in January — it is rarely successful, especially in the middle of an ocean the size of the Atlantic.


Recently I submitted a new paper:

: Upper and Lower Bounds in Exponential Tauberian Theorems. Tbilisi Mathematical Journal, vol. 2, pp. 41–50, 2009.
link, arXiv:0908.0642, more…

This is another leftover bit from my PhD thesis. Comments are most welcome!


I added a few photos from New York to my Photo album:

[a road in Manhattan] [Manhattan] [Union Square, New York] [houses in New York] [Times Square] [central park] [nice figures made from rubbish]


Since 1st April 2009 I'm a lecturer at the statistics department of the University of Leeds.


Recently we submitted a book chapter for the upcoming Oxford Handbook of Nonlinear Filtering:

, and : Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods. Pages 833–873 in The Oxford Handbook of Nonlinear Filtering, Dan Crisan and Boris Rozovsky (editors), Oxford University Press, 2011.
link, preprint:pdf, more…


I like to use the matplotlib Python library to create figures for use in my articles. Unfortunately, the matplotlib default settings seem to be optimised for screen-viewing, so creating pictures for inclusion in a scientific paper needs some fiddling. Here is an example script which sets up things for such a figure.

#! /usr/bin/env python

import matplotlib
matplotlib.use("PS")
from pylab import *

rc("font", family="serif")
rc("font", size=10)

width = 4.5
height = 1.4
rc("figure.subplot", left=(22/72.27)/width)
rc("figure.subplot", right=(width-10/72.27)/width)
rc("figure.subplot", bottom=(14/72.27)/height)
rc("figure.subplot", top=(height-7/72.27)/height)
figure(figsize=(width, height))

x = loadtxt("x.dat")
plot(x[:,0], x[:,1], "k-")

savefig("figure.eps")

The script performs the following steps:

  • The matplotlib.use statement tells matplotlib to use one of the non-interactive backends. Without this statement, the script would need a connections to an X-windows server to run.
  • The two rc statement set the font family and size for the labels on the axes to match the font used in the article's text (matplotlib uses a 12pt sans-serif font by default).
  • The rc("figure.subplot", ...) settings and the figsize argument to the figure call set the image geometry. Since the text on journal pages typically is about 5in wide, a total figure width of 4.5in is often a good value. The margins set by the rc statements have to be given as fractions of the image width/height and must be big enough to contain the axis labels. A bit of experimentation helps to find good values; here we use a 14pt margin at the bottom (10pt font height plus some space), 22pt left margin (depends on the number of digits in the y-axis labels), 7pt for the top margin (space for half of the top-most y-axis label, and then some more) and 10pt to the right (enough space for half the right-most x-axis label).
  • The color code "k" in the plot command sets the plot color to black. Matplotlib default is blue, but some journals charge you for use of colour in your articles (and also it is nice if you can print things on a b/w printer).
  • The savefig command creates an encapsulated PostScript file. This format is good for inclusion in TeX documents. The file can be converted to PDF using the epstopdf command as needed.


Today I found the British government's guidelines for computing the CO2 emission for different modes of transportation. I like the careful explanation of the authors' methodology.

The results are summarised in the table below. All data is for 2008, and I rounded and simplified the values to make them easier to digest. You can find the true results in the report. The values for cars (marked with an asterisk) are given in g/km, you need to divide them by the number of passengers in the car.

mode of transportation CO2 emission
[g / passenger km]
reference
national rail 60 table 21 (page 25)
london underground 65 table 21 (page 25)
trams etc. 78 table 21 (page 25)
Eurostar 18 table 21 (page 25)
bus 107 table 17 (page 21)
coach 29 table 17 (page 21)
petrol car 207* table 11 (page 18)
diesel car 198* table 11 (page 18)
motor cycle 106* table 18 (page 22)
domestic flights 175 table 3 (page 9)
short-haul flights 98 table 3 (page 9)
long-haul flights 111 table 3 (page 9)

Update (2010-04-13). DEFRA reorganised their web page, so the above link does no longer work. Their reports can now be found on the Greenhouse gas (GHG) conversion factor methodology papers page. I believe that the 2008 report there is identical to what I used for the table.


I released version 0.8 of my implementation of the voting algorithm used by the Debian project. The new version should be functionally equivalent to version 0.7, but the source code is greatly cleaned up. Many thanks to Neil Schemenauer for providing patches! You can download the program from my page about the Debian voting system.


It is Christmas very soon, now. Why don't you get me a Christmas present from my Amazon wishlist?


Some photos from the North Carolina state fair are now on my photo album:

[steak on a stick] [pig racing audience] [pig racing] [baby pig] [people] [more people] [Julien and Jochen] [giant horse] [goat] [goats] [North Carolina's finest] [Ana, Megan and Jochen] [a potter's wheel] [roller coaster] [demolition derby] [roller coaster in the dark]


Last Saturday I finally managed to visit Durham. Some photos are on my photo album:

[Duke University campus] [Duke University campus] [bus stop] [Duke University campus] [Duke Gardens] [Banana tree] [red palm tree] [man eating plant] [Duke botanical gardens] [Christmas decoration plant] [Durham downtown] [Durham downtown] [Durham downtown] [Durham Centerfest 2008] [old motherboard] [mechanical knight]

My favourite is the last one: it shows a mechanical knight guarding a railway crossing.


The common log format of the Apache web server contains information about page views on a web page. Today I figured out how this format can be parsed in a Python program.

Typical log entries look like this (I added line breaks for better readability):

66.249.71.138 - - [20/Sep/2008:10:09:09 +0100]
  "GET / HTTP/1.1"
  200 4235 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
67.15.143.7 - - [20/Sep/2008:13:35:06 +0100]
  "GET //errors.php?error=http://www.rooksgambit.org/thebigsystem//sistem.txt? HTTP/1.1"
  404 2272 "-" "libwww-perl/5.79"

The first step of parsing these lines is easy, the regular expression library allows to break a line into the individual fields:

import re

parts = [
    r'(?P<host>\S+)',                   # host %h
    r'\S+',                             # indent %l (unused)
    r'(?P<user>\S+)',                   # user %u
    r'\[(?P<time>.+)\]',                # time %t
    r'"(?P<request>.+)"',               # request "%r"
    r'(?P<status>[0-9]+)',              # status %>s
    r'(?P<size>\S+)',                   # size %b (careful, can be '-')
    r'"(?P<referer>.*)"',               # referer "%{Referer}i"
    r'"(?P<agent>.*)"',                 # user agent "%{User-agent}i"
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')

line = ... a line from the log file ...
m = pattern.match(line)
res = m.groupdict()

This results in a Python dictionary, containing the individual fields as strings.

The second step is to convert the field values from strings to Python data types and to remove the oddities of the log file format. Most fields are easy:

if res["user"] == "-":
    res["user"] = None

res["status"] = int(res["status"])

if res["size"] == "-":
    res["size"] = 0
else:
    res["size"] = int(res["size"])

if res["referer"] == "-":
    res["referer"] = None

Converting the time value into a Python datetime object is surprisingly messy, mainly because of the timezone information. The best way I found until now is the following:

import time, datetime

class Timezone(datetime.tzinfo):

    def __init__(self, name="+0000"):
        self.name = name
        seconds = int(name[:-2])*3600+int(name[-2:])*60
        self.offset = datetime.timedelta(seconds=seconds)

    def utcoffset(self, dt):
        return self.offset

    def dst(self, dt):
        return timedelta(0)

    def tzname(self, dt):
        return self.name

tt = time.strptime(res["time"][:-6], "%d/%b/%Y:%H:%M:%S")
tt = list(tt[:6]) + [ 0, Timezone(res["time"][-5:]) ]
res["time"] = datetime.datetime(*tt)

If you know of any better way of doing this, please let me know. Hints are very welcome!


After many delays, the paper containing the main result of my PhD thesis finally was published today:

  • : Large Deviations for One Dimensional Diffusions with a Strong Drift. Electronic Journal of Probability, vol. 13, no. 53, pp. 1479–1526, 2008.
    link, preprint:pdf, more…

This is more theoretical than the things I currently do, but I believe that it is a good paper.

Abstract. We derive a large deviation principle which describes the behaviour of a diffusion process with additive noise under the influence of a strong drift. Our main result is a large deviation theorem for the distribution of the end-point of a one-dimensional diffusion with drift θb where b is a drift function and θ a real number, when θ converges to ∞. It transpires that the problem is governed by a rate function which consists of two parts: one contribution comes from the Freidlin-Wentzell theorem whereas a second term reflects the cost for a Brownian motion to stay near a equilibrium point of the drift over long periods of time.


Mark Reid wrote a very nice review article about the current (very strong) evidence for the existence of a supermassive black hole at the centre of the Milky Way. Definitely worth a read! (Found via the KFC blog.)


Today I discovered the very exciting wordle tool at wordle.net. It can convert texts into word clouds like the following:

[worlde created word field]


A nice MacOS X feature:

voss@soup [~] osascript -e 'tell app "ARDAgent" to do shell script "whoami"'
root


Today I discovered, by chance, the web page A Review of the Universe - Structures, Evolutions, Observations, and Theories. I have not yet read it all but the parts I looked at seem quite nice.


Stephen Dolan provides a very nice page which can find the shortest link between any two Wikipedia articles. The results are sometimes surprising: for example, can you guess how to get from Pleistocene to Microprocessor in three clicks?


This week-end we had an exciting trip to Blenheim Palace. [Blenheim palace]


In a recent questionaire, our IT services asked the following, interesting question: Please rank the following in your priority order (1 = Highest, 2 = Medium, 3 = Lowest)

  • Guaranteed access to the network during normal working hours
  • Access to the network at anytime (24 x 7)
  • The maximum bandwidth possible

A university with working hours only (presumably theirs not mine) network access is an interesting thought. And I wonder whether guaranteed access differs from access or whether the first option is just a subset of the second option.


I finally got around to put some photos from my trip to Aachen onto my web page. Some weeks late, but here they are:

[sparse tree] [dead horse] [some road] [Cathedral I] [Cathedral II] [The Throne of Charlemagne] [Pulpit]


Our article

is now in print.


Today I completed version 0.4 of the Wisent parser generator for Python. The program is now efficient enough to deal with grammars of the size of the C grammar.

I spent some effort on generating useful error messages for grammar conflicts. For example the usual "if-else conflict" in the C grammar is reported as follows:

cg.wi:164:61: shift-reduce conflict: the input
cg.wi:164:61:     IDENTIFIER { if ( IDENTIFIER ) ;.else ...
cg.wi:164:61:   can be shifted using the production rule
cg.wi:164:61:     selection_statement: if ( expression ) statement.else statement;
cg.wi:164:25:   or can be reduced to
cg.wi:164:25:     IDENTIFIER { selection_statement.else ...
cg.wi:164:25:   using the production rule
cg.wi:164:25:     selection_statement: if ( expression ) statement;
cg.wi: 1 conflict, aborting ...

Now you can even tell Wisent to ignore this conflict by placing an ! before the offending else.


While trying to add the capability to create LALR(1) parsers to Wisent, I came across the following paper:

  • David Pager, A practical general method for constructing LR(k) parsers.
    Acta Informatica, volume 7 (1977), number 3, pages 249–268.

This article is nicely written and I hope that using his algorithm I can teach Wisent to create efficient LR(1) parsers, thus removing the need for LALR(1) support.


Today I published the first version of a parser generator for Python. Currently it only generates LR(1) parsers, i.e. it is not very useful for big grammars. I hope to add support for LALR(1) parsers in the future.

One problem is the choice of a project name: I came up with Wisent since this fits together well with Bison. But unfortunately I am not the first one to think so, there are at least two other parser generators which also use the name Wisent, and most likely I will rename my program. Suggestions for a better name, preferably with Monty Python connotations, are very welcome.


I put my go ranking list online. Currently it covers some games played in the common room of the Warwick mathematics department. You can find the list on the secret go portal.


There is a lamp powered by gravity. The energy is created by a weight slowly sliding down some rail and you recharge the lamp by moving the weight up again. The motion is then converted to electricity and used to power the LEDs which create the light. I want one of these lamps!!!

Update. A quick calculation shows that this lamp cannot possibly work: the best possible efficiency for generating light is approximately 683 lumen per watt (real efficiencies are much lower), thus to get the advertised 600-800 lumens … over a period of 4 hours one would need more than 12650 joule. Since lifting a weight of 1 kg by 1 meter only provides 9.81 joule, the lamp, even assuming 100% efficiency, will never be able to generate enough electricity to power the LEDs.


Understanding Art for Geeks is a very nice picture collection on Flickr.


During the Christmas holidays I carved an ancient Egytian hippo from soapstone. It turned out to be quite a happy animal. The photo shows the almost finished statuette, only some polishing is left to do.
[Hippo]

My photo page has a picture of a very early stage.


We prepared another fast-dm release over the Chrismas holidays: release 29 fixes a bug where for fixed z large values of sz were not correctly recognised.


The previous fast-dm version (release 27) introduced an unfortunate bug. We released a fixed version today. As a nice side effect the program is now a bit faster for some parameter constellations (big st0 and high precision). If you are using an older version of fast-dm, you should upgrade to the new release 28.


According to an article in the Times it seems that the United States' government believes it to be legal to kidnap anybody who is suspected of a crime by them. I find this scary!


Andreas and I finished release 27 of our program fast-dm for parameter estimation in Ratcliff's diffusion model (in Psychology). You can download the new release from my fast-dm web page.


I had missed this until now: In Leeds a man created a terrorist threat by falling into coma on a bus. Most exciting!


If M's a complete metric space
(and non-empty), it's always the case:
    If f's a contraction
    Then, under its action,
Exactly one point stays in place.


Today I completed another article, which will eventually appear in the conference proceedings for Heinrich v. Weizsäcker's birthday conference:

  • , and : Sampling Conditioned Diffusions. Pages 159–186 in Trends in Stochastic Analysis, Cambridge University Press, vol. 353 of London Mathematical Society Lecture Note Series, 2009.
    link, preprint:pdf, more…

This is a (hopefully) very nice and readable summary of our project to use SPDEs to construct MCMC methods in infinite dimensional spaces.


Yesterday I learned of the existence of flash cookies (aka local shared objects) which somehow had evaded my attention until now. Flash cookies seem to be very similar to the usual HTTP cookies, except that they can store more information (up to 100kb by default, instead of 4kb for HTTP cookies).

Flash cookies are independent of the browser's normal cookie settings and are not blocked by by tools like CookieSafe. It seems that the only way to view flash cookies is by visiting Adobe's Setting Manager web page.


Today I spent some time looking through my spam folder. Since beginning of this year I received 47587 spam messages but only 38281 ham (i.e. good) messages. This means that 55% of my mails are now spam.

Not surprisingly, the spam messages came from a much wider set of hosts than the spam messages: 43673 hosts sent only spam, 3034 hosts sent only ham, and 145 hosts sent both types of messages. This effect can also be seen in my old map of the internet.

The worst offender is the host with the IP address 80.253.80.28. It sent 49 spam emails to me this year, the first one on 2007-02-23 (Bitte Geld abholen !!), the last one until now on 2007-11-02 (Bitte zurück Rufen). It is sad that this spammer manages to use the same address for 10 month in a row for an, in my eyes, criminal activity without being stopped.


I started reading Ulrich Drepper's excellent series What every Programmer should know about memory. It is published by Linux Weekly News in weekly installments. Until now, the following parts appeared:

The author shows, for example, how the naive matrix-matrix multiplication

for (i = 0; i < N; ++i)
  for (j = 0; j < N; ++j)
    for (k = 0; k < N; ++k)
      res[i][j] += A[i][k] * B[k][j];

can be improved to run 10 times faster for big matrices!


Recently I learned how to compute the distribution of the number N of visits in zero of a one-dimensional, symmetric, simple random walk (Xn), conditioned on X2n=0. There are two ways to compute this.

Plan A. Using the nice equality

P(X1≠0, …, X2n≠0) = P(X2n=0)

which I learned long ago as a student, we can see that there is a bijection between all bridge paths and all paths which don't hit zero. (The equality is proved by constructing a bijection between the complements of these sets, using the reflection principle.) We can apply this to bridge paths with at least k zeros, mapping the path segment starting at the kth zero to a path which does not hit zero. This results in a bijection between all bridge paths having at least k zeros and all paths having exactly k zeros. If you happen to know this number, you are done. Otherwise you can use

Plan B. Consider the following map: for each bridge path with at least k zeros, flip the last k excusions so that they are all positive and then remove the last step of each of these excursions. This procedure defines a map from paths of length 2n with at least k zeros which end at zero to paths of length 2n-k (we removed k steps) which end at level k (all removed steps where downwards). It transpires that this map is surjective and every image has exactly 2k pre-images. Thus the number of bridge paths with at least k zeros is

/ 2n-k \ 2k | | \ k /

and dividing by the total number of bridge paths gives the required probability:

P(Nk | X2n=0) / 2n-k \ 2k | | \ k / = --------------- . / 2n \ | | \ n /


[kitchen shelf during construction]

For my birthday I built myself a new kitchen shelf, starting with timber from homebase. At the back it has specially crafted gaps so that the wires and tubes from my heating can go through. Say hello to my new shelf!


Another paper I wrote together with my brother Andreas finally has been accepted:

This paper describes the mathematical foundations which underly our fast-dm program.

Copyright © 2007–2024 Jochen Voss. All content on this website (including text, pictures, and any other original works), unless otherwise noted, is licensed under the CC BY-SA 4.0 license.