Fast-dm version 30.2 released
I am happy to announce the release of fast-dm version 30.2. This release fixes some memory leaks found by Valgrind. You can download the program from the Fast-dm homepage.
By Jochen Voss, published , updated
I am happy to announce the release of fast-dm version 30.2. This release fixes some memory leaks found by Valgrind. You can download the program from the Fast-dm homepage.
A new paper, together with my PhD student Mark Webster and my colleague Stuart Barber, has been published today!
This paper took a long time to be published, but I think it is worth the wait.
I am very happy to announce that a very old (nearly forgotten?) article has finally appeared. We chose the journal because we had heard the rumour that they have a fast turnaround, but this plan didn't work out: the article was submitted 23 September 2011 and accepted on 21 February 2012. But then came a two year wait, and the article only has appeared just now:
Do you want to do a PhD on Bayesian Uncertainty Quantification
with me? Are you good with maths and computers? As part of the new,
NERC funded,
doctoral training centre I am advertising for a PhD position. Details can
be found on the
DTP web page,
the deadline for applications is 24 January 2014, start of the PhD is
October 2014.
I am very happy to report that after a long struggle we have finally finished (and submitted) a new paper today:
S. Barber, J. Voss and M. Webster:
The Rate of Convergence for Approximate Bayesian Computation.
Electronic Journal of Statistics, vol. 9, pp. 80–105, 2015.
DOI:10.1214/15-EJS988, arXiv:1311.2038, more…
The paper studies the convergence of ABC estimates to the correct value. In particular,
Hopefully, these results prove useful for others, too.
I just tried to find out what the default random number generator of
the Go programming language is. The
source code for this RNG can be found in the file
rng.go
;
the only reference in the source code is the cryptic comment algorithm
by DP Mitchell and JA Reeds
. Does anybody have a reference for this,
or can at least identify DP Mitchell
or JA Reeds
? Some
things I tried:
Mitchellin the context of random number generation, I found references to the following article: D. Mitchell,
Nonlinear Key Generators,Cryptologia, vol. 14, pp. 350-354, 1990. I hoped that this article may allow me to find an affiliation for
D. Mitchell, but unfortunately it seems to have been removed(!) from the journal. The table of contents shows a gap between pages 349 and 355.
mitchell reeds random number generatorshows no obvious good candidates.
Any hints about where the algorithm comes from would be most welcome.
Hooray!!!
I am very happy to report that my book has finally appeared. While the book officially is out for a few days already, it was only very recently that it appeared on Amazon. You can get your copy by clicking on the thingy below (if it shows; otherwise by clicking here).
The book is written as a text-book for postgraduate students and covers basic topics around the area of Monte Carlo methods. Some more information can be found on the publisher's web page.
This is quite exciting for me. It's a book!!! I have a paper copy: you can touch it and all, it's really real!
Today I got an email from a Dr. R. K. Dixit (HON)
, bringing good
news:
It seems that I could be an expert in Frontier Research
!!! The
email refers to the following paper:
While this is a nice paper (it got much better during the refereeing
process; had I known this in advance I would have submitted it to a better
known journal), it is still a relief to hear that the paper has very
good impact
. Google currently doesn't know of any citations of my
paper.
As it happens, the publisher of GJSFR, Global Journals Inc. (US), is
also listed on
Beall's list of
potential, possible, or probable predatory scholarly open-access
publishers
.
A while ago, the computer game Infinity Blade 2 was given away for free as part of some advertisement campaign and I got myself a copy at the time.
One of the features in the game is a "gem forge", where one can put in three gems and, after a few hours, gets a different gem in return. Gems have both financial value (gems can be sold) and practical value (gems can be used to enhance fighting skills and equipment). I searched for advice about how to best use the gem forge, but found only vague rumours and incomprehensible to me lists.
To understand the gem forge better, I started taking notes about which output I got for which combination of inputs. The procedure is a bit slow, so I have only 16 data points so far, but I got a few results:
I haven't looked at other properties (shape, colour, function, etc.) of the gems yet.
If anybody knows more about the gem forge, or if there is a good reference available on the internet, I'd be happy to learn about this.
During the last days I finished an implementation of the Fortuna random number generator in the Go programming language. The result can be found on the new Fortuna home page and on github.com.
This is very different from the random number generators I usually deal with: Normally I am interested in random number generators for use in Monte Carlo simulations; this is, for example, the topic of my upcoming book (click the image below for more information). In contrast, the Fortuna random number generator is made for use in cryptographic algorithms; for example it could be used to generate session tokens for a web app.
I have just published a small package for the excellent Go programming language: The new package, trace, implements a simple tracing framework. Code using this framework can emit diagnostic messages using the trace.T() function. In normal operation, calls to trace.T() have no effect and are very fast. When tracing is enabled (in order to track down a problem or to explore the inner workings of a program), listeners can record/display the trace messages.
Details can be found on the tracke package home page. Comments are very welcome!
I am very happy to report that I have finished writing my first-ever
book! The title is An Introduction to Statistical Computing
and the
book will be published by
Wiley.
This book is a textbook, mostly for postgraduate students, and it covers
random number generation, Monte Carlo methods, and a few more advanced
topics.
Today I submitted a new paper. Preparing this took forever, but it's finally out:
The Police and Crime Commissioner elections will take place on Thursday. The question is whom to vote for.
Some thoughts:
none of the aboveoption at the end of the list of candidates.
The Labour Party Candidate,
The Conservative Party Candidate,
Independentand
Liberal Democrats. I feel a bit uneasy about how this seems to go against the principle of separation of powers.
The candidates for Leeds are the following (summarised from the choose my PCC web page):
stand against the £100m Government cuts of around 2000 police officers and staff in West Yorkshire, but I would be surprised if he succeeded in this.
has served on West Yorkshire Police Authority, whatever this means. She writes
there are communities that the police feel unable to tackle crime in because they fear the consequences, and I wish she would tell us which communities she means by this (is this about gang crime? Muslims? Corrupt politicians?). Also, she writes
victims should be able to decide the community punishment of offenders, possibly indicating that she favours revenge over rehabilitation? She
will recruit 850 new Community Special Constables, but doesn't say how this will be funded. She uses all the conservative buzz-words from
zero-toleranceto
unnecessary red tape.
I will chase the corrupt and incompetent out of officeand
I know the difference between right and wrong. I never allowed negative traits to influence my behaviour whilst serving you.He seems VERY focussed on bullying.
reduce red tape,
payback sentencing, but also
reduce reoffending. He seems to have a stronger focus on domestic violence than the other candidates do.
For (my own) future reference, here are example commands to create a 3D (surface) plot in R:
library("rgl") open3d(windowRect=c(50,50,800,800)) x <- seq(-10, 10, length=20) y <- seq(-10, 10, length=20) z <- outer(x,y, function(x,y) dnorm(x, 2, 3)*dnorm(y, 3, 7)) palette <- colorRampPalette(c("blue", "green", "yellow", "red")) col.table <- palette(256) col.ind <- cut(z, 256) persp3d(x, y, z, col=col.table[col.ind])
These commands open the plot in a new window. The
windowRect
parameter in the call to open3d
determines the initial position and size of the window (many thanks to Jem
Corcoran for pointing this out). The output can be rotated and scaled
using the mouse.
Following up on my recent blog entry about generating publication quality figures with R, here are some more hints, this time about setting the figure margins.
An R figure consists of several regions. From the outside in, these are as follows:
width
and height
arguments of the pdf
command.
figure 1. The different regions of a figure generated in R
These different regions are illustrated in figure 1, above. The figure was created using the following R code:
par(oma=c(2, 2, 2, 2), mar=c(4, 4, 2, 2)) set.seed(1) plot(rnorm(10), xlab="x label", ylab="y label") box("outer", col="red", lwd=2) mtext("device region", side=3, line=1, col="red", outer=TRUE) box("inner", col="green") mtext("figure region", side=3, line=1, col="green") box("plot", col="blue") text(5, 0, "plot region", col="blue")
A plot for use in a publication will normally need to make efficient use of the available space. For this purpose, the default margins used by R are too generous. This is illustrated in figure 2.
figure 2. A simple R plot, using the default margin settings. The red box was added to show the total area covered by the figure. There are unnecessarily large margins on the top and the right of the figure.
To maximise the area available for displaying information, unnecessary white space around the plot should be avoided. To achieve this, the following suggestions may be useful:
par(oma=c(0, 0, 0, 0))
par(mai=c(0.85, 0.85, 0.1, 0.1))
Here we use mai
(margin in inches) instead of
mar
(margin in lines), to have better control over size of
margins. The four numbers give the widths of the margins at the bottom,
left, top and right in this order. Since the plot has no labels on the top
and on the right, the last two numbers can be chosen close to 0.
A version of the above plot, with the margins adjusted as shown above, is displayed in figure 3:
figure 3. The plot from figure 2, but with the margin sizes adjusted to maximise the area available for displaying information. This figure will take up the same amount of space as figure 2 in the enclosing document, but provides much more space for displaying the plot data.
References:
I am happy to notice that one of my papers has been published, after a bit of a wait:
If you are interested in numerical solution of stochastics partial differential equations, this may be of interest. Comments about the article are very welcome.
For one of my programming experiments, I want to compute timestamps by using the number of seconds since the beginning of 1970. The resulting value will be used for communication between a web browser and my web server. My aim is to find a method to get the very similar values for the timestamp in JavaScript code running on the web browser and in a Python program running on my web server.
To solve this problem, I need a method which does not depend on the time zone the web user is in, and which does not break when the daylight saving time (DST) starts or ends. One solution to this problem is, to perform the computation of the timestamp in Coordinated Universal Time (UTC, a successor of the Greenwich Mean Time). For reference, this post describes what I found.
In the web browser, I am using the following JavaScript code:
now = new Date() timestamp = now.getTime() / 1000
Here, the getTime
method does all the work; since
JavaScript timestamps use milliseconds since the beginning of 1970, we
have to divide by 1000 to convert to seconds. According to the
ECMAScript specification
(also governing JavaScript), the resulting value should be independent of
time zone and DST. For testing, I changed the clock of my computer to a
different time zone; everything still worked: after restarting the
browser, now
reflects the new time zone, but
timestamp
is only changed by the few seconds I needed to
restart the browser.
The code on the web server is written in Python. After a lot of experimenting, I've found two working combinations: The more straightforeward way of computing a timestamp is as follows.
import datetime import time now = datetime.datetime.now() timestamp = time.mktime(now.timetuple()) + now.microsecond * 1e-6
Despite the fact that there is no explicit conversion to UTC in this
code, the computed timestamp gives the number of seconds since the start
of 1970 in UTC. This depends on time.mktime
's ability to
magically know the current time zone of the computer this code is running
on. The same computation can also be done using times in UTC directly.
import datetime import calendar now = datetime.datetime.utcnow() timestamp = calendar.timegm(now.utctimetuple()) + now.microsecond * 1e-6
This time, the code depends on the utcnow
function knowing
the current timezone. An ugly point of the second solution is, that it
requires the slightly obscure
calendar module
from the Python standard library. Again, I tested this code by changing
the time zone of the computer's clock and all worked as expected.
I just learned a new trick: the "texdoc" program automatically finds and displays information for LaTeX packages. On my shiny Mac, using the MacTeX distribution, I can type commands like the following:
texdoc geometry texdoc tikz texdoc jvlisting
This shows the documentation for the corresponding package as a PDF file.
I am currently in the process of writing a book, and I plan to produce several figures for this project using the statistical computing package R. While R is generally quite good at creating scientific graphics, some adjustments help to create high-quality images for inclusion in another document. This blog entry gives some simple hints about the required steps.
\documentclass{article} \usepackage{graphicx} \begin{document} Some useful information is shown in figure~\ref{figure1}. \begin{figure} \begin{center} \includegraphics{figure1} \end{center} \caption{\label{figure1}This is a figure. It shows useful information.} \end{figure} \end{document}
pdf()
command. The required calls are as follows:
pdf("figure1.pdf", width=4.6, height=3) ...commands to generate the plot invisible(dev.off())
The options width
and height
give the outer
dimensions of the plot (including labels and margins) in inches. The
dev.off()
command at the end is required to make sure all output is written to the
PDF file. I enclose dev.off
inside
invisible()
in order to suppress an annoying message about the "null device" being
printed to the screen.
figure1.R
might contain the R
code required to generate figure1.pdf
. On Linux and MacOS we
can use a
shebang-line to
turn the R-script into a stand-alone program. This requires two steps:
First, we have to add a line like the following at the start of the R
script:
#! /usr/bin/env Rscript pdf("figure1.pdf", width=4.6, height=3) ...commands to generate the plot invisible(dev.off())
And secondly, we need to make the script executable by using (in a
terminal window) a command like
.
After these steps, chmod +x figure1.R
figure1.R
can simply be executed to
regenerate figure1.pdf
. This particularly useful for use in
Makefiles to automatically regenerate the figure every time the R script
is changed.
I am happy to announce release 0.6.2 of Wisent, the Python parser generator. This release fixes a few minor issues which have been found in the previous release of Wisent. The source code of Wisent 0.6.2 can be downloaded from the Wisent homepage or from github.com.
I am happy to announce release 0.2 of the jvqplot data plotting program. One of the main features of jvqplot is that the plot is automatically updated whenever the data in the input file changes.
Changes in this release:
The new jvqplot release can be downloaded from the jvqplot homepage.
This weekend, I finished a new program: JvJsDoc, a program to extract documentation from JavaScript source code and to present the collected information in a set of HTML pages. It is meant to be used together with the Google Closure Library and the Google Closure Compiler. As an example: JvJsDoc can convert the JavaScript source file goog/disposable/disposable.js into the HTML documentation in goog/Disposable.html.
More information can be found on the JvJsDoc homepage, and you can download the program here:
First public release.
After taking much longer than I intended to, today I finally submitted a new paper (about a year later than planned). You can have a look at the preprint here:
jvlisting
is a
LaTeX package which
provides an alternative to LaTeX's built-in verbatim
environment. The new release, version 0.5, fixes some bugs and cleans up
the TeX source code a bit.
jvlisting
can be downloaded from
my webserver.
I hope you find this package useful. Bug reports and comments are very welcome!
Last week, Kanti Mardia and I submitted a new paper:
I am particularly curious about how the referees will react to my nice figures on pages 6 and 7.
I have typed my lectures notes for the statistical computing module I am teaching this term. The notes are available online:
Comments are very welcome!
I just started playing with Google+, not sure yet whether this will be useful or interesting to me.
Just for reference, here is a recipe about how to watch your DVDs on an Apple iPad. I have read that some of the required steps may be illegal to perform when you are in the USA (or an American citizen?); don't follow these instructions if you are not allowed to!
The instructions provided here use MPlayer/MEncoder on Ubuntu Linux to convert the DVD content into a file of the required format, but many other options are available.
lsdvd) to find out which track of the DVD contains the actual movie. Normally this will be the longest track,
lsdvd
helpfully lists the index of this track at the bottom
of its output.
lsdvd
mplayer) to verify that this is indeed the track you want:
TRACKNO=1 mplayer dvd://$TRACKNO
You have to replace the value of TRACKNO
with the track
number found in the previous step.
mplayer
to determine any cropping reqired:
mplayer dvd://$TRACKNO -vf cropdetect
This should print an option of the form
-vf crop=720:430:0:48
to the console.
The call to mencoder
requires about a million arguments.
The following is inspired by a post at
Mike McCandless' blog.
CROP=720:430:0:48 X264OPTS=crf=28:vbv_maxrate=1500:nocabac:global_header\ :frameref=3:threads=auto:bframes=0:subq=6:mixed-refs=0\ :weightb=0:8x8dct=1:me=umh:partitions=all:qp_step=4\ :qcomp=0.7:trellis=1:direct_pred=auto FAACOPTS=br=160:mpeg=4:object=2:raw mencoder dvd://$TRACKNO \ -of lavf -lavfopts format=mp4 \ -vf crop=$CROP,pp=lb \ -ovc x264 -x264encopts $X264OPTS \ -oac faac -faacopts $FAACOPTS -channels 2 -srate 48000 \ -o out.mp4
You should replace the value of CROP
with the value found
in the previous step. This command can take a considerable amount of time
to complete (comparable to the playing time of the DVD track), and should
produce a big file out.mp4
(about 225MB per hour of playing
time).
A while ago I finished my first home-made LaTeX package. The new package is called "jvlisting" and provides a replacement for LaTeX's verbatim environment.
This package provides the LaTeX environment listing
, an
alternative to the built-in verbatim
environment. The
listing
environment is specially taylored for including
listings of computer program source code into documents. The main
advantages over the original verbatim
environment are that
listing
environment automatically fixes leading
whitespace so that the environment and program listing can be indented
with the rest of the document source, and
listing
environments may easily be customised and
extended.
You can download the package and its documentation from
my webserver or from the
jvlisting page at the
Common TeX Archive Network (CTAN). I'd be happy if you could give my
package a try. Comments about jvlisting
are very welcome!
Recently I spent some time to understand how one can execute a macro repeatedly, once for every line of text in a LaTeX environment. Since the solution is a bit tricky and I found it diffcult to find answers on the web, here is a summary of what I learned.
Step 1. In order to prevent input lines from
being concatenated by TeX before we get access to them, we can use the
\obeylines
macro. This allows to define a macro which matches
everything until the end of line:
\def\doline#1{line found: `#1'\par} {\obeylines \gdef\getline#1 {\doline{#1}}} {\obeylines \getline This is the \textbf{first} line. This is the second line.}
Note that \obeylines
must be in effect both while we
define the \getline
macro and while scanning the text. The
outer pairs of enclosing brackets are there to contain the effect of
\obeylines
. To make \getline
visible outside
these brackets we use
\gdef
to
define the macro in the global namespace. The output of the above TeX code
is
line found: ‘This is the first line.’
This is the second line.
Step 2. We would like to have the
\getline
macro called for every line instead of just for the
first one. This can be achieved by putting a \getline
(without arguments) at the end of the \getline
replacement
text. The only complication is that we somehow need to stop this procedure
once the end of the region of interest has been reached:
\def\marker{END} {\obeylines \gdef\getlines#1 {\def\text{#1}% \ifx\text\marker \let\next\empty \else \doline{#1}\let\next\getlines \fi \next}} {\obeylines \getlines This is the \textbf{first} line. This is the second line. END }
The
\ifx
command is used to test whether the matched line is the same as
\marker
. Until we find the maker, we put a new call to
\getlines
at the end of the \getlines
replacement text, thereby looping over all lines until the marker is
found. For this to work, the END
needs to occur on a line on its
own; therefore we have to move the closing brackets down one line. The TeX
code above leads to the following output.
line found: ‘This is the first line.’
line found: ‘This is the second line.’
Step 3. We are now ready to pack the commands from above into the definition of a LaTeX environment:
\def\marker{\end{dolines}} {\obeylines \gdef\getlines#1 {\def\text{#1}% \ifx\text\marker \let\next\text \else \doline{#1}\let\next\getlines \fi \next}} \newenvironment{dolines}{\begingroup\obeylines\getlines}% {\endgroup} \begin{dolines} This is the \textbf{first} line. This is the second line. \end{dolines}
Here we had to change the definition of \getlines
in order
to include the \end{dolines}
in the replacement text when the
\ifx
is true; otherwise it would have been swallowed by
\dolines
. The resulting output is, a bit surprisingly, as
follows:
line found: ‘’
line found: ‘This is the first line.’
line found: ‘This is the second line.’
The extra empty line at the beginning is caused by the
\getlines
macro matching the (empty) text between
\begin{dolines}
and the end of line. The
\end{listing}
must be on a line on its own for this
environment to work.
Step 4. If we want to prevent processing of the
TeX commands inside the dolines
environment, we can do so by
changing the
category codes of
special characters like
to 12 (\
other
). A
complication with this plan is, that \ifx
also compares
category codes. Therefore, when we try to detect the end of our
environment while special characters are switched off, we need to define
\marker
as the string
, but
with the category codes of all special charcters (the backslash and the
curly brackets) set to 12. This can be done by using the following,
contorted sequence of commands:\end{dolines}
\begingroup \catcode`|=0 \catcode`[=1 \catcode`]=2 \catcode`\{=12 \catcode`\}=12 \catcode`\\=12 |gdef|marker[\end{dolines}] |endgroup
Two more changes are required to make this work: first, we need to disable all special characters at the beginning of the environment:
\let\do=\@makeother\dospecials
If you want to use this command outside a style file, you will need to
turn
into a letter by bracketing your definitions
with@
\makeatletter ... \makeatother
Secondly, inside \getlines
, the expansion of
\text
still has the category codes as found in the input.
Therefore, if the input was read with special characters switched of, we
cannot write \let\next\text
to get the closing
\end{dolines}
(since the category codes of the backslash and
the curly brackets would be wrong). Instead, we need to use a definition
like \def\next{\end{dolines}}
.
Example. Using the techniques explained above,
we can define a replacement for the LaTeX comment environment (contained
in the
verbatim
package) as follows:
\documentclass{article} \makeatletter \begingroup \catcode`|=0 \catcode`[=1 \catcode`]=2 \catcode`\{=12 \catcode`\}=12 \catcode`\\=12 |gdef|marker[\end{comment}] |endgroup {\obeylines \gdef\getlines#1 {\def\text{#1}% \ifx\text\marker \def\next{\end{comment}} \else \let\next\getlines \fi \next}} \newenvironment{comment}{\begingroup \let\do=\@makeother\dospecials \obeylines\getlines}% {\endgroup} \makeatother \begin{document} \begin{comment} Everything here will be ignored. \Invalid LaTeX code and incomplete constructs like \begin{itemize} without the closing end are no problem. \end{comment} \end{document}
Funny coincidence: Yesterday I learned that the following paper which we first submitted in 2010 has been accepted:
And today I received an email notice that a paper we submitted in 2009(!) is finally going into print now:
We are currently advertising a new lectureship in the statistics department of the University of Leeds. Thus, if you are a PostDoc and want to become a collegue of me, have a look at the job advert. The closing date for applications is 31st March 2011.
I am happy to announce the release of Jvqplot version 0.1.
Jvqplot is a data plotting program, resembling a simplified version of gnuplot. It is very simple to use, but it can only plot simple data files. The format and scaling of the plot is automatically chosen and cannot be changed. One of the main features of jvqplot is that the plot is automatically updated whenever the data in the input file changes. More information can be found on the Jvqplot homepage.
empty lines in data files separate data sets, clean up the program sources
first public release
This is the first public release of jvqplot. Any feedback about the program would be most welcome.
Since I don't watch TV programmes and also don't own a TV, I do not need a TV license. The TV licensing people seem to find this similarly difficult to believe than their German counterparts, so they keep hassling me about this.
Their latest letter seems to allow to declare online that I don't need a license. Unfortunatly, the web form quickly forces me to choose one of the following possibilities:
The last option seems like a possible choice for me, except for the
fact that my computer probably qualifies as equipment
. So which
option am I meant to choose????
I just discovered the (1991) article What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. Very useful reading!
I am happy to announce the release of Wisent version 0.6.1. Wisent is a parser generator for Python. The new release fixes a problem (introduced in version 0.6) where comments and multi-line strings in grammar files were broken. You can download the program from the wisent homepage.
I am slightly embarrassed by the fact that I've been caught out by the birthday paradox yesterday. The encounter went as follows:
While testing a new random number generator by analysing a list of 1000 generated standard normal distributed random numbers, I discovered that the list contained one of the numbers twice!!! This is suspicious, because this event has probability 0 in theory. After an (unsuccessful) hunt for bugs in my program, I finally found the following explanation.
The program prints the numbers using a C command like
printf("%f\n", normal(0, 1));
By default, the %f
format string outputs numbers with a
precision of six significant digits:
-0.641062 1.116142 1.417036 0.337435 -0.310383 ...
Most of the numbers will lay between -2 and 2, i.e. they are concentrated in a set of about 4 million possible values. A quick check reveals that 1000 independent uniform draws out of a set of this size contains a number twice with a probability of more than 10, and for the normal distribution the probability will be even higher because the numbers are more concentrated around 0. Thus, seeing a number twice is something which will actually happen from time to time and is no indication that the program is malfunctioning!
Just now, my (not very fancy) phone showed the following corrupted message to me:
I'm a bit at a loss about what is going on? If you have any idea, I would be happy to receive hints via email. Do I need to worry?
The program DSSP is used
to determine the
secondary structure
of a protein, taking the three dimensional coordinates of its atoms as the
input. Bio3d is a library for the
statistical software package R
which makes it easier to analyse protein structure; as part of this, bio3d
contains an interface to DSSP (in the function dssp
). Since I
had a bit of trouble using this interface, here are some hints.
dssp
function with this directory as the
exepath
argument. The value of the exepath
argument must end in /
(on Linux/Unix/MacOS) or
\
(on Microsoft Windows). Also note that every \
in an R string constant needs to be doubled, e.g.
on Microsoft Windows you will probably need to write something like
exepath="C:\\path\\to\\file\\"
.
Example. In R one can now use the following commands.
library("bio3d") pdb <- read.pdb("12as") x <- dssp(pdb, exepath="/usr/local/bin/")
Then the secondary structure information of the protein 12AS can be accessed as follows:
> x$helix $start 1 2 3 4 5 6 7 8 9 1 2 5 76 130 170 182 258 277 297 320 271 310 $end 1 2 3 4 5 6 7 8 9 1 2 27 83 155 176 193 268 283 305 325 275 312 $length 1 2 3 4 5 6 7 8 9 1 2 23 8 26 7 12 11 7 9 6 5 3 $chain [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
This tells us that the first helix reaches from residue 5 to residue 27 (both inclusive).
I learned about the different kinds of grains long ago, when I was in primary school. Since then I have somehow managed to confuse myself about which grain is which. Here's to unconfusion! (All information and pictures are from Wikipedia; follow the links for details.)
name | picture | comments |
---|---|---|
wheat /
Weizen (Triticum aestivum, Triticum durum, etc.) | Used for bread, flour, couscous, beer, biofuel, … Needs good soil and climate, several species of wheat are used, main source of vegetable protein in human food. | |
barley /
Gerste (Hordeum vulgare) | Used for beer, whisky, animal feeding, … One of the first cultivated crops (together with einkorn and emmer). | |
oat /
Hafer (Avena sativa) | Used for animal feeding (e.g. horses), oat
flakes, porridge, … Can be grown in regions with relatively cold, wet summers. | |
rye /
Roggen (Secale cereale) | Used for bread, animal feeding, … Can grow on relatively poor soil and in colder climates. |
In the last days I have finished my first ever browser-based game: a puzzle, based on the classical fifteen puzzle. Cou can play the game here and download the source code from the program's homepage.
Today, Martin and I finally finished our paper "Approximations to the Stochastic Burgers Equation". A preprint can be downloaded below, comments are very welcome.
It is of course
well known
that
water fluoridation
is a Communist conspiracy to sap and impurify all of our precious
bodily fluids
. Still, I am surprised to learn that one of the
political parties running in the UK's upcoming
general elections
uses their opposition to water fluoridation as a campaign topic.
Can you guess which party this is? (for the answer click here)
Only one day after we submitted the revised version of our paper
Sampling Conditioned Hypoelliptic Diffusions
(see my previous blog
post), the article was accepted! It will appear in the
Annals of Applied Probability.
In order to address the referee's comments, we have prepared an updated version the following article:
I gave a talk at the Maths2010 conference in Edinburgh. Since I had only 15 minutes, the talk is nearly free of contents, but in compensation there are many nice pictures. I'm particularly fond of the torus on page 8 (which was surprisingly difficult to generate).
This is a (some weeks old) photo of all twelve of me on Ilkley Moor.
Yesterday I learned how to speed up the vector operation y := y + αx by using SSE2 instructions.
The function I was trying to speed up was as follows:
static void daxpy0(int n, double a, const double *x, double *y) { int i; for (i=0; i<n; ++i) { y[i] += a * x[i]; } }
Using SSE2 instructions, operating on two double precision numbers at a time, this can be written as follows:
#include <emmintrin.h> static void daxpy1(int n, double a, const double *x, double *y) { __m128d aa, xx, yy; int i; aa = _mm_set1_pd(a); /* store a in both registers */ for (i=0; i+1<n; i+=2) { xx = _mm_load_pd(x+i); /* load two elements from x ... */ xx = _mm_mul_pd(aa, xx); /* ... and multiply both by a */ yy = _mm_load_pd(y+i); /* load two elements from y ... */ yy = _mm_add_pd(yy, xx); /* ... and add */ _mm_store_pd(y+i, yy) /* write back both elements of y */ } if (i < n) { y[i] += a * x[i]; /* handle last element if n is odd */ } }
This code required that the vectors x
and y
are 16-byte aligned (otherwise a segmentation fault will occur). This
assumption holds, for example, for memory blocks allocated by
malloc
on 64bit Linux and MacOS X systems. Also,
obviously, this only works on CPUs which support the SSE2 instruction set.
A description of the SSE2 instructions used here can be found in the
Intrinsics Reference
of the
Intel C++ Compiler Documentation
(also seems to apply to the GNU C compiler; direct link
here).
For comparison, I also tried to use the daxpy
function
provided by BLAS:
static void daxpy2(int n, double a, const double *x, double *y) { extern void daxpy_(int *Np, double *DAp, const double *X, int *INCXp, double *Y, int *INCYp); int inc = 1; daxpy_(&n, &a, x, &inc, y, &inc); }
Details about this approach are described on my page about linear algebra packages on Linux.
Results. I tried the three functions using two
vectors of length 2000. The following table gives the execution time for a
million calls to daxpy
(on a newish quad-core Linux machine):
method | function | time [s] | direct | daxpy0
| 1.52 |
---|---|---|
SSE2 | daxpy1
| 0.91 |
BLAS | daxpy2
| 2.47 |
As can be seen, the SSE2 code in daxpy1
is fastest,
compared to the naive implementation daxpy0
it takes 40% off
the execution time! For some reason, the BLAS version seems to be very
slow; and the results on my MacOS machine are similar. Currently I have no
explanation for this effect.
Recently I learned how to tunnel http traffic (e.g. web surfing) over an ssh connection. The effect of this is that you can browse the web on one computer A, say, but for the web servers you are visiting it will look like your requests originate from a different host B. You need to be able to log into host B via ssh for this to work.
There are several situations where such tunneling is useful:
Setting up a tunnel is done in two steps:
ssh -D 8080 -f -q -N login@host
You will need to replace login and host with your login details. The machine you type this command on is machine A in the description above, and host is machine B. This command will start an ssh process which will run in the background and will act as a tunnel to forward the web traffic.
localhostand port
8080. Both SOCKS4 and SOCKS5 should work. This will tell the web browser to connect to the local end of the ssh tunnel.
Over the weekend I finished a helper which allows to easily distribute a number of programs over the available CPUs on a multi-core system. You can find the program on the Parallel homepage.
I just found somebody's blog post stating 8 Reasons Normal People Should Juggle and I agree very much with his reasons.
Yesterday, I completed version 0.6 of my Python parser generator Wisent. The new version allows to use UTF-8 encoded grammar files (originally, only ASCII characters could be used in grammar files), adds a CSS parser as an additional example, and fixes some minor bugs. You can download the program from the Wisent homepage.
Yesterday I finished a revised version of my article about exponential Tauberian theorems. The main change is, that it transpired that my proof could be trivially modified to get a more general statement. Many thanks to the (anonymous) referee for pointing this out.
Today, one of our papers was accepted:
Just in case this is useful for anybody: here is an implementation of the HSV to RGB colorspace conversion in Python:
from __future__ import division def hsv2rgb(h, s, v): hi = int(h//60 % 6) f = h/60 - h//60 p = v * (1-s) q = v * (1-f*s) t = v * (1-(1-f)*s) return [ (v,t,p), (q,v,p), (p,v,t), (p,q,v), (t,p,v), (v,p,q) ][hi]
Update. In the meantime I learned that the Python standard library has a built-in version of this function in the colorsys module.
Today, I finally submitted the SPDE-paper I wrote with Martin Hairer and Andrew Stuart. This paper took a long time to complete (and it changed from an applied maths paper into a pure maths paper in the process). You can download a pre-print here; comments are very welcome:
A while ago, Andreas and I submitted another paper. Here is a pre-print:
A. Voss, J. Voss and K.C. Klauer:
Separating Response-Execution Bias from Decision Bias: Arguments for an Additional Parameter in Ratcliff's Diffusion Model.
British Journal of Mathematical and Statistical Psychology, vol. 63,
no. 3, pp. 539–555, 2010.
DOI:10.1348/000711009X477581, preprint:pdf, more…
I just found out that Amazon sells Uranium ore. Not sure whether I should get some …
Over the week-end I completed the work on release 0.5 of my Python parser generator Wisent. You can now download the program from the Wisent homepage. The main news is the addition of a users' manual.
Today I discovered by chance the quirksmode.org page. Somebody there has spent lots and lots of time to work out the differences between different web-browsers, in particular in CSS and JavaScript support. Very useful!!!
Some days ago I found this strange story about two men trying to smuggle bonds worth 134 billion US dollars across the Italian-Swiss border. Or maybe it was fake bonds? Everything about this case seems quite mysterious to me.
Media coverage is strangly spotty, but summaries are published by the online editions of the Times and the German newspaper Die Welt (which I don't normally read). Of course there are also lots of conspiracy theories.
Update: It transpires that the bonds were fakes.
A sentence
from the
BBC web page:
Although passengers survived a landing on the Hudson River in New York in January — it is rarely successful, especially in the middle of an ocean the size of the Atlantic.
Recently I submitted a new paper:
J. Voss:
Upper and Lower Bounds in Exponential Tauberian Theorems.
Tbilisi Mathematical Journal, vol. 2, pp. 41–50, 2009.
link, arXiv:0908.0642, more…
This is another leftover bit from my PhD thesis. Comments are most welcome!
I added a few photos from New York to my Photo album:
Since 1st April 2009 I'm a lecturer at the statistics department of the University of Leeds.
Recently we submitted a book chapter for the upcoming Oxford Handbook of Nonlinear Filtering:
M. Hairer, A.M. Stuart and J. Voss:
Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods.
Pages 833–873 in The Oxford Handbook of Nonlinear Filtering,
Dan Crisan and Boris Rozovsky (editors), Oxford University Press, 2011.
link, preprint:pdf, more…
I like to use the matplotlib Python library to create figures for use in my articles. Unfortunately, the matplotlib default settings seem to be optimised for screen-viewing, so creating pictures for inclusion in a scientific paper needs some fiddling. Here is an example script which sets up things for such a figure.
#! /usr/bin/env python import matplotlib matplotlib.use("PS") from pylab import * rc("font", family="serif") rc("font", size=10) width = 4.5 height = 1.4 rc("figure.subplot", left=(22/72.27)/width) rc("figure.subplot", right=(width-10/72.27)/width) rc("figure.subplot", bottom=(14/72.27)/height) rc("figure.subplot", top=(height-7/72.27)/height) figure(figsize=(width, height)) x = loadtxt("x.dat") plot(x[:,0], x[:,1], "k-") savefig("figure.eps")
The script performs the following steps:
matplotlib.use
statement tells matplotlib to use one
of the non-interactive backends. Without this statement, the script would
need a connections to an X-windows server to run.
rc
statement set the font family and size for the
labels on the axes to match the font used in the article's text
(matplotlib uses a 12pt sans-serif font by default).
rc("figure.subplot", ...)
settings and the
figsize
argument to the figure
call set the
image geometry. Since the text on journal pages typically is about 5in
wide, a total figure width of 4.5in is often a good value. The margins set
by the rc
statements have to be given as fractions of the
image width/height and must be big enough to contain the axis labels. A
bit of experimentation helps to find good values; here we use a 14pt
margin at the bottom (10pt font height plus some space), 22pt left margin
(depends on the number of digits in the y-axis labels), 7pt for the
top margin (space for half of the top-most y-axis label, and then
some more) and 10pt to the right (enough space for half the right-most
x-axis label).
"k"
in the plot command sets the
plot color to black. Matplotlib default is blue, but some journals charge
you for use of colour in your articles (and also it is nice if you can
print things on a b/w printer).
savefig
command creates an encapsulated PostScript file. This format is good for inclusion in TeX documents. The file can be converted to PDF using the epstopdf command as needed.
Today I found the British government's guidelines for computing the CO2 emission for different modes of transportation. I like the careful explanation of the authors' methodology.
The results are summarised in the table below. All data is for 2008, and I rounded and simplified the values to make them easier to digest. You can find the true results in the report. The values for cars (marked with an asterisk) are given in g/km, you need to divide them by the number of passengers in the car.
mode of transportation | CO2 emission [g / passenger km] | reference |
---|---|---|
national rail | 60 | table 21 (page 25) |
london underground | 65 | table 21 (page 25) |
trams etc. | 78 | table 21 (page 25) |
Eurostar | 18 | table 21 (page 25) |
bus | 107 | table 17 (page 21) |
coach | 29 | table 17 (page 21) |
petrol car | 207* | table 11 (page 18) |
diesel car | 198* | table 11 (page 18) |
motor cycle | 106* | table 18 (page 22) |
domestic flights | 175 | table 3 (page 9) |
short-haul flights | 98 | table 3 (page 9) |
long-haul flights | 111 | table 3 (page 9) |
Update (2010-04-13). DEFRA reorganised their web page, so the above link does no longer work. Their reports can now be found on the Greenhouse gas (GHG) conversion factor methodology papers page. I believe that the 2008 report there is identical to what I used for the table.
I released version 0.8 of my implementation of the voting algorithm used by the Debian project. The new version should be functionally equivalent to version 0.7, but the source code is greatly cleaned up. Many thanks to Neil Schemenauer for providing patches! You can download the program from my page about the Debian voting system.
It is Christmas very soon, now. Why don't you get me a Christmas present from my Amazon wishlist?
Some photos from the North Carolina state fair are now on my photo album:
Last Saturday I finally managed to visit Durham. Some photos are on my photo album:
My favourite is the last one: it shows a mechanical knight guarding a railway crossing.
The
common log format
of the Apache web server contains information about page views on a web
page. Today I figured out how this format can be parsed in a Python
program.
Typical log entries look like this (I added line breaks for better readability):
66.249.71.138 - - [20/Sep/2008:10:09:09 +0100] "GET / HTTP/1.1" 200 4235 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 67.15.143.7 - - [20/Sep/2008:13:35:06 +0100] "GET //errors.php?error=http://www.rooksgambit.org/thebigsystem//sistem.txt? HTTP/1.1" 404 2272 "-" "libwww-perl/5.79"
The first step of parsing these lines is easy, the regular expression library allows to break a line into the individual fields:
import re
parts = [
r'(?P<host>\S+)', # host %h
r'\S+', # indent %l (unused)
r'(?P<user>\S+)', # user %u
r'\[(?P<time>.+)\]', # time %t
r'"(?P<request>.+)"', # request "%r"
r'(?P<status>[0-9]+)', # status %>s
r'(?P<size>\S+)', # size %b (careful, can be '-')
r'"(?P<referer>.*)"', # referer "%{Referer}i"
r'"(?P<agent>.*)"', # user agent "%{User-agent}i"
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')
line = ... a line from the log file ...
m = pattern.match(line)
res = m.groupdict()
This results in a Python dictionary, containing the individual fields as strings.
The second step is to convert the field values from strings to Python data types and to remove the oddities of the log file format. Most fields are easy:
if res["user"] == "-": res["user"] = None res["status"] = int(res["status"]) if res["size"] == "-": res["size"] = 0 else: res["size"] = int(res["size"]) if res["referer"] == "-": res["referer"] = None
Converting the time value into a Python datetime object is surprisingly messy, mainly because of the timezone information. The best way I found until now is the following:
import time, datetime class Timezone(datetime.tzinfo): def __init__(self, name="+0000"): self.name = name seconds = int(name[:-2])*3600+int(name[-2:])*60 self.offset = datetime.timedelta(seconds=seconds) def utcoffset(self, dt): return self.offset def dst(self, dt): return timedelta(0) def tzname(self, dt): return self.name tt = time.strptime(res["time"][:-6], "%d/%b/%Y:%H:%M:%S") tt = list(tt[:6]) + [ 0, Timezone(res["time"][-5:]) ] res["time"] = datetime.datetime(*tt)
If you know of any better way of doing this, please let me know. Hints are very welcome!
After many delays, the paper containing the main result of my PhD thesis finally was published today:
This is more theoretical than the things I currently do, but I believe that it is a good paper.
Abstract. We derive a large deviation principle which describes the behaviour of a diffusion process with additive noise under the influence of a strong drift. Our main result is a large deviation theorem for the distribution of the end-point of a one-dimensional diffusion with drift θb where b is a drift function and θ a real number, when θ converges to ∞. It transpires that the problem is governed by a rate function which consists of two parts: one contribution comes from the Freidlin-Wentzell theorem whereas a second term reflects the cost for a Brownian motion to stay near a equilibrium point of the drift over long periods of time.
Mark Reid wrote a very nice review article about the current (very strong) evidence for the existence of a supermassive black hole at the centre of the Milky Way. Definitely worth a read! (Found via the KFC blog.)
Today I discovered the very exciting wordle
tool at
wordle.net. It can convert texts into
word clouds like the following:
A nice MacOS X feature:
voss@soup [~] osascript -e 'tell app "ARDAgent" to do shell script "whoami"' root
Today I discovered, by chance, the web page A Review of the Universe - Structures, Evolutions, Observations, and Theories. I have not yet read it all but the parts I looked at seem quite nice.
Stephen Dolan provides a very nice page which can find the shortest link between any two Wikipedia articles. The results are sometimes surprising: for example, can you guess how to get from Pleistocene to Microprocessor in three clicks?
This week-end we had an exciting trip to Blenheim Palace.
In a recent questionaire, our IT services asked the following, interesting question: Please rank the following in your priority order (1 = Highest, 2 = Medium, 3 = Lowest)
A university with working hours
only (presumably theirs not
mine) network access is an interesting thought. And I wonder whether
guaranteed access
differs from access
or whether the first
option is just a subset of the second option.
Our article
is now in print.
Today I completed version 0.4 of the Wisent parser generator for Python. The program is now efficient enough to deal with grammars of the size of the C grammar.
I spent some effort on generating useful error messages for grammar conflicts. For example the usual "if-else conflict" in the C grammar is reported as follows:
cg.wi:164:61: shift-reduce conflict: the input cg.wi:164:61: IDENTIFIER { if ( IDENTIFIER ) ;.else ... cg.wi:164:61: can be shifted using the production rule cg.wi:164:61: selection_statement: if ( expression ) statement.else statement; cg.wi:164:25: or can be reduced to cg.wi:164:25: IDENTIFIER { selection_statement.else ... cg.wi:164:25: using the production rule cg.wi:164:25: selection_statement: if ( expression ) statement; cg.wi: 1 conflict, aborting ...
Now you can even tell Wisent to ignore this conflict by placing an
!
before the offending else
.
While trying to add the capability to create LALR(1) parsers to Wisent, I came across the following paper:
This article is nicely written and I hope that using his algorithm I can teach Wisent to create efficient LR(1) parsers, thus removing the need for LALR(1) support.
Today I published the first version of a parser generator for Python. Currently it only generates LR(1) parsers, i.e. it is not very useful for big grammars. I hope to add support for LALR(1) parsers in the future.
One problem is the choice of a project name: I came up with
Wisent
since this fits together well with
Bison. But unfortunately
I am not the first one to think so, there are at least two other parser
generators which also use the name Wisent, and most likely I will rename
my program. Suggestions for a better name, preferably with
Monty Python
connotations, are very welcome.
I put my go ranking list online. Currently it covers some games played in the common room of the Warwick mathematics department. You can find the list on the secret go portal.
There is a
lamp powered by gravity.
The energy is created by a weight slowly sliding down some rail and you
recharge
the lamp by moving the weight up again. The motion is then
converted to electricity and used to power the LEDs which create the
light. I want one of these lamps!!!
Update. A quick calculation shows that this
lamp cannot possibly work: the
best possible efficiency
for generating light is approximately 683 lumen per watt (real
efficiencies are much lower), thus to get the advertised 600-800 lumens
… over a period of 4 hours
one would need more than
12650 joule. Since
lifting a weight of 1 kg by 1 meter only provides
9.81 joule, the lamp, even assuming 100% efficiency, will never be
able to generate enough electricity to power the LEDs.
Understanding Art for Geeks is a very nice picture collection on Flickr.
During the Christmas holidays I carved an ancient Egytian hippo from
soapstone. It turned out to be quite a happy animal. The photo shows the
almost finished statuette, only some polishing is left to do.
My photo page has a picture of a very early stage.
We prepared another fast-dm release over the Chrismas holidays: release 29 fixes a bug where for fixed z large values of sz were not correctly recognised.
The previous fast-dm version (release 27) introduced an unfortunate bug. We released a fixed version today. As a nice side effect the program is now a bit faster for some parameter constellations (big st0 and high precision). If you are using an older version of fast-dm, you should upgrade to the new release 28.
According to an article in the Times it seems that the United States' government believes it to be legal to kidnap anybody who is suspected of a crime by them. I find this scary!
Andreas and I finished release 27 of our program fast-dm for parameter estimation in Ratcliff's diffusion model (in Psychology). You can download the new release from my fast-dm web page.
I had missed this until now: In Leeds a man created a terrorist threat by falling into coma on a bus. Most exciting!
If M's a complete metric space
(and non-empty), it's always the case:
If f's a contraction
Then, under its action,
Exactly one point stays in place.
Today I completed another article, which will eventually appear in the conference proceedings for Heinrich v. Weizsäcker's birthday conference:
This is a (hopefully) very nice and readable summary of our project to use SPDEs to construct MCMC methods in infinite dimensional spaces.
Yesterday I learned of the existence of flash cookies (aka local shared objects) which somehow had evaded my attention until now. Flash cookies seem to be very similar to the usual HTTP cookies, except that they can store more information (up to 100kb by default, instead of 4kb for HTTP cookies).
Flash cookies are independent of the browser's normal cookie settings and are not blocked by by tools like CookieSafe. It seems that the only way to view flash cookies is by visiting Adobe's Setting Manager web page.
Today I spent some time looking through my spam folder. Since beginning
of this year I received 47587 spam messages but only
38281 ham
(i.e. good) messages. This means that 55% of my mails are now spam.
Not surprisingly, the spam messages came from a much wider set of hosts than the spam messages: 43673 hosts sent only spam, 3034 hosts sent only ham, and 145 hosts sent both types of messages. This effect can also be seen in my old map of the internet.
The worst offender is the host with the IP address 80.253.80.28. It
sent 49 spam emails to me this year, the first one on 2007-02-23 (Bitte
Geld abholen !!
), the last one until now on 2007-11-02 (Bitte
zurück Rufen
). It is sad that this spammer manages to use the same
address for 10 month in a row for an, in my eyes, criminal activity
without being stopped.
I started reading Ulrich Drepper's excellent series What every
Programmer should know about memory
. It is published by
Linux Weekly News in weekly installments.
Until now, the following parts appeared:
The author shows, for example, how the naive matrix-matrix multiplication
for (i = 0; i < N; ++i) for (j = 0; j < N; ++j) for (k = 0; k < N; ++k) res[i][j] += A[i][k] * B[k][j];
can be improved to run 10 times faster for big matrices!
Recently I learned how to compute the distribution of the number N of visits in zero of a one-dimensional, symmetric, simple random walk (Xn), conditioned on X2n=0. There are two ways to compute this.
Plan A. Using the nice equality
which I learned long ago as a student, we can see that there is a bijection between all bridge paths and all paths which don't hit zero. (The equality is proved by constructing a bijection between the complements of these sets, using the reflection principle.) We can apply this to bridge paths with at least k zeros, mapping the path segment starting at the kth zero to a path which does not hit zero. This results in a bijection between all bridge paths having at least k zeros and all paths having exactly k zeros. If you happen to know this number, you are done. Otherwise you can use
Plan B. Consider the following map: for each bridge path with at least k zeros, flip the last k excusions so that they are all positive and then remove the last step of each of these excursions. This procedure defines a map from paths of length 2n with at least k zeros which end at zero to paths of length 2n-k (we removed k steps) which end at level k (all removed steps where downwards). It transpires that this map is surjective and every image has exactly 2k pre-images. Thus the number of bridge paths with at least k zeros is
and dividing by the total number of bridge paths gives the required probability:
Another paper I wrote together with my brother Andreas finally has been accepted:
This paper describes the mathematical foundations which underly our fast-dm program.
Copyright © 2007–2024 Jochen Voss. All content on this website (including text, pictures, and any other original works), unless otherwise noted, is licensed under the CC BY-SA 4.0 license.