Playing with MultiGraph

multigraph-logo72x72I’ve been playing around with a cool JavaScript library called MultiGraph which lets you interact with graphical data embedded in a blog post.   The data format is a simple little xml file called a “MUGL“.   Here’s a sample that took all of about 10 minutes to create:

Note that you can pan and zoom in on the data.   For those readers who are interested, this data is the Oxygen-Oxygen pair distribution function, \(g_{OO}(r)\), for liquid water that was inferred from X-ray scattering data from  G. Hura, J. M. Sorenson,  R. M. Glaeser, and  T. Head-Gordon, J. Chem. Phys. 113(20), pp. 9140-9148 (2000).

Inserting this into the blog post involved uploading two files, the javascript library itself and the MUGL file. After those were in place, there were only two lines that needed to be added to the blog post:

<script type="text/javascript" src="http://www.openscience.org/blog/wp-content/uploads/2013/05/multigraph-min.js"></script>


<div class="multigraph" data-height="300" data-src="http://www.openscience.org/blog/wp-content/uploads/2013/05/gofrmugl.xml" data-width="500"></div>

One thing that would be nice would be a way to automate the process of going from an xmgrace file directly to the MUGL format.

SimThyr – simulation software for pituitary thyroid feedback

feedback_overview_smallThis is a bit outside our normal area of expertise, but it looks interesting.

Thyroid hormones play an important role in metabolism, growth and differentiation. Therefore, exact regulation of thyroid hormone levels is vital for most organisms. The mechanism for the feedback control known, but the dynamics are still a bit of a mystery.  There’s an interesting page on the different models for thyrotropic feedback control at the Midizinische Kybernetic (Medical Cybernetics) site.  SimThyr is an open source Pascal-based simulation program for the pituitary thyroid feedback control mechanism that explores these models and makes predictions for dynamics based on parameters of the feedback mechanism.

Not a kickstarter for science, a prize clearinghouse

prize_moneyYesterday’s post on the reversible random number generators received some interesting reactions from my colleagues.  They were uniformly impressed with the solution to what everyone thought was a hard problem, but surprisingly, most of the scientists I talked to were most excited about the fact that dangling a $500 reward for solving a hard problem generated nearly instantaneous results.  Typical comments:

I wonder if I similarly spent my startup how much science I could get done…

Also, it is amazing what $500 buys these days!

Think how many problems we could solve if we dangled a few prizes for other knotty problems.

So what made this work?

  • The problem itself was well-framed and finite:  ”We need a time-reversible random number generator.”  It was something that a lot of people in the field could agree was interesting when framed to them properly.
  • The group offering the prize was widely-respected for previous work on related problems.
  • The prize and the solution were both posted on a highly visible physics site (arXiv).
  • The reward was about fame and recognition by the community more than it was about money.

I’m now wondering if all of  the attempts to get a kickstarter or crowdsourced funding model for science (e.g. sciflies, petridish, scifundchallenge, fundageek) are just a bit misguided.  Science is darned expensive, and for better or worse, we’re going to be wedded to federal and foundation funding for science for a long time.  All funding models have an aspect of salesmanship to them – a scientist must convince the funder that the problem itself is interesting enough to need solving, and that their lab is the one to solve it.   In the NSF-style funding model, scientific communities do have significant input into what the “good problems” are, but the necessary delays in funding and the scarcity of funds means that we’re not very agile.

Perhaps we need a clearinghouse where scientific communities can agree on a tough challenge, pool some minimal award money (like $500 or $1000) and let their young colleagues have a go at winning fame by solving them.

Reversible Random Number Generators

random_numberThis news comes by way of John Parkhill, my new colleague here at Notre Dame.

William G. Hoover (of the Nosé-Hoover Thermostat) and Carol G. Hoover issued a $500 challenge on arXiv to generate a time-reversible random number generator.  The challenge itself would be quite remarkable news.  What’s even better is that the challenge (including the source code for an implementation) was solved in 6 days by Frederico Ricci-Tersenghi.

Why is this a big deal?  Most of the equations in physics that govern time evolution of particles obey time-reversal symmetry; the same differential equations that govern molecular or planetary motion will take you back to your starting point if you suddenly reverse the time variable.  This is a usually a fantastic way to check to see if you are doing the physics correctly in your simulations, and also means that collections of  starting points that are related to each other behave in certain predictable ways when they evolve.

Stochastic approaches to physical motion introduce an aspect of randomness to mimic the behavior of complex phenomena like the motion of solvent surrounding the molecule we’re interested in, or to mimic the transitions between different electronic states of a molecule.   The introduction of random numbers has meant we had to give up time-reversibility, and we’ve been willing to live with that for a long time because we can study more complicated phenomena.

If we have access to a time-reversible pseudo-random number generator, however, we get that very powerful tool back in our toolbox.

Now, the Langevin equation,

\(m \frac{d^2 x}{dt^2} = F – \gamma(t) \frac{dx}{dt} + R(t)\)

 

has two things that prevent it from being time-reversible.  Besides the stochastic or random force, \(R(t)\), there’s also a drag or friction force, \(-\gamma(t) \frac{dx}{dt}\), that depends on the velocities of the particles.  There’s no solution yet to time reversibility for this piece (and I have my doubts that there ever will be a way to reverse this).  I suppose if we offer up another $500 prize for time-reversible drag, we’d make some traction on this problem…

(The comic above courtesy of xkcd).

Relax – Molecular dynamics by NMR data analysis

RelaxEdward d’Auvergne pointed out the relax program, which looks like a useful way to connect experimental NMR spectra with molecular dynamics simulations.

relax is designed for the study of molecular dynamics of organic molecules, proteins, RNA, DNA, sugars, and other biomolecules through the analysis of experimental NMR data. It supports exponential curve fitting for the calculation of the R1 and R2 relaxation rates, calculation of the NOE, reduced spectral density mapping, the Lipari and Szabo model-free analysis, study of domain motions via the N-state model or ensemble analysis and frame order dynamics theories using anisotropic NMR parameters such as RDCs and PCSs, and the investigation of stereochemistry.

The Tyranny of Pi day

pigraphicMarch 14th is \(\pi\)-day in the US (and perhaps \(4.\overline{666}\) day in Europe). The idea of a day devoted to celebrating an important irrational number is wonderful — I’d love to see schools celebrate e-day as well, but February 71st isn’t on the calendar. Unfortunately, March 14th has also become the day in which 4th and 5th graders around the US practice for one of the most pointless exercises imaginable – a competition to recite the largest number of digits of \(\pi\).

Memorization of long digit strings is not an exercise that teaches a love of mathematics (or anything else useful about the natural world).  This is solely an exercise in recall, which is perhaps valuable for remembering phone numbers, but not for understanding transcendental constants. For all practical purposes, only the first few digits of \(\pi\) are really necessary – the first 40 digits of \(\pi\) is enough to compute the circumference of the Milky Way galaxy with an error less than the size of an atomic nucleus.

So, because \(\pi\) is a such an accessible entry to mathematics and science, I thought I’d come up with a list of other cool \(\pi\) things that could replace these pointless memory contests:

  • The earliest written approximations of \(\pi\) are found in Egypt and Babylon, and both are within 1 percent of the true value. In Babylon, a clay tablet dated 1900–1600 BC has a geometrical statement that, by implication, treats \(\pi\) as 25/8 = 3.1250. In Egypt, the Rhind Papyrus, dated around 1650 BC, but copied from a document dated to 1850 BC has a formula for the area of a circle that treats \(\pi = \left(\frac{16}{9}\right)^2 \approx 3.1605\).
  • In 220 BC, Archimedes proved that \( \frac{223}{71} < \pi < \frac{22}{7}\).  The mid-point of these fractions is 3.1418.
  • Around 500 AD, the Chinese mathematician Zu Chongzhi  was using a rational approximation for \(\pi \approx 355/113 = 3.14159292\), which is astonishingly accurate.  For most day-to-day uses of \(\pi\) this particular approximation is still sufficient.
  • By 800 AD, the great Persian mathematician, Al-Khwarizmi, was estimating \(\pi \approx 3.1416\)
  • A good mnemonic for the decimal expansion of \(\pi\) is given by the letter count in the words of the sentences: “How I want a drink, alcoholic of course, after the heavy lectures involving quantum mechanics. All of thy geometry, Herr Planck, is fairly hard…”
  • Georges-Louis Leclerc, The Comte de Buffon came up with one of the first “Monte Carlo” methods for computing the value of \(\pi\) in 1777.  This method involves dropping a short needle of length \(\ell\) onto lined paper where the lines are spaced a distance \(d\) apart.  The probability that the needle crosses one of the lines is given by:  \(P = \frac{2 \ell}{\pi d}\).
  • In 1901, the Italian mathematician Mario Lazzarini attempted to compute \(\pi\) using Buffon’s Needle.  Lazzarini spun around and dropped a 2.5 cm needle 3,408 times on a grid of lines spaced 3 cm apart. He got 1,808 crossings and estimated \(\pi = 3.14159292\). This is a remarkably accurate result!   There is now a fair bit of skepticism about Lazzarini’s result, because his estimate reduces to Zu Chongzhi’s rational approximation.  This controversy is covered in great detail in Mathematics Magazine 67, 83 (1994).
  • Another way to estimate \(\pi\) would be to use continued fractions.  Although there are simple continued fractions for \(\pi\), none of them show any obvious patters.  There’s a beautiful (but non-simple) continued fraction for \(\frac{4}{\pi}\):
    \(\frac{4}{\pi} = 1 + \frac{1^2}{2 + \frac{3^2}{2 + \frac{5^2}{2 + \frac{7^2}{2 + …}}}}\)

    Can you spot the pattern?

  • Vi Hart, the wonderful mathemusician, has a persuasive argument that we should instead be celebrating \(\tau\) day on June 28th.   Actually, all of her videos are wonderful.  If my kids spent all day doing nothing but playing with snakes  it would be better than memorizing digits of \(\pi\).Pie Plate Pi
  • Another wonderful way to compute \(\pi\) is to use nested round and square baking dishes (of the correct size) and drop marbles into them randomly from a distance.  Simply count up the number of marbles that land in the circular dish and keep track of the total number of marbles that landed in either the circle or the square. Since the area formulae for squares and circles are related, the value of \(\pi = 4 \frac{N_{circle}}{N_{total}}\).

There are probably 7000 better things to do with \(\pi\) day than digit memory contests. There are lots of creative teachers out there — how are all of you going to celebrate \(\pi\)-day?

Do.abl.es

Do.abl.es

Do.abl.es

Do you want to know you can measure DNA contour lengths using ImageJ?  Perhaps you want to stain a C. Elegans embryo for imaging?  Or possibly, you might want to test whether or not you have gotten an immune response using ELISA?

Martin Fitzpatrick sends word of a cool collection of open access scientific protocols called Do.abl.es.  For the uninitiated, protocols are the recipes that scientists use to carry out experiments in a reproducible way.  The list of protocols posted to Do.abl.es to date has a number of interesting and important biochemistry and biology experiments.

There’s also a neat companion site called Install.abl.es which concentrates on many of the same things we do – the use of open source software in the sciences.

The Up-Goer Five Research Challenge

I thought this was silly at first, but after struggling to do it for my own research, I now think it can be a profound exercise that scientists should attempt before writing their NSF broader impact statements. Here’s the challenge: Explain your research using only the 1000 most common English words. Here’s a tool to keep you honest: http://splasho.nfshost.com/upgoer5/  The idea was inspired by Randall Munroe’s wonderful Up Goer Five explanation of the Saturn V moon rocket.

And here’s my attempt:

The things we use every day are made of very tiny bits. When we put lots of those bits together we get matter. Matter changes how it acts when it gets hot or cold, or when you press on it. We want to know what happens when you get some of the matter hot. Do the bits of hot matter move to where the cold matter is? Does the hot matter touch the cold matter and make the cold matter hot? We use a computer to make pretend bits of matter. We use the computer to study how the hot matter makes cold matter hot.

The task is much harder than you think.   Here’s a collection curated by Patrick Donohue (a PhD candidate in lunar petrology right here at Notre Dame):  Common words, uncommon jobs

Overture – A C++ toolkit for Solving PDEs in Complex Geometries


Overture
This looks useful!   The partial differential equations (PDEs) we solve in my lab are the equations of motion for atoms in molecular dynamics.  These are relatively easy to integrate numerically.  Lots of labs work with harder PDE problems  (like the response of metallic nanostructures to electromagnetic fields) that have difficult boundary conditions in complex geometries.   Overture is an object-oriented code framework for solving partial differential equations (PDEs). It provides a portable, flexible software development environment for applications that involve the simulation of physical processes in complex moving geometry . It is implemented as a collection of C++ libraries that enable the use of finite difference and finite volume methods at a level that hides the details of the associated data structures. Overture is designed for solving problems on a structured grid or a collection of structured grids. In particular, it can use curvilinear grids, adaptive mesh refinement, and the composite overlapping grid method to represent problems involving complex domains with moving components. There are also utilities for   building grids on CAD geometries and for building hybrid grids that can be used with applications that use unstructured grids.

SASSIE – Create atomistic models from Small Angle scattering data

SASSIEHere’s a neat bit of “bridge” or “glue” software for today – SASSIE is a python-based suite for creating atomistic models of molecular systems in order to compare those models directly to data from small-angle X-ray scattering (SAXS) and small-angle neutron scattering (SANS) experiments.  SASSIE is the work of Joseph Curtis and Susan Krueger from the NIST Center for Neutron Research.  You can use SASSIE to generate and manipulate large numbers of structures and to calculate the SANS, SAXS, and neutron reflectivity profiles from atomistic structures that result from molecular dynamics (MD) or Monte Carlo (MC) simulations.

“Bridge” or “glue” software increases the functionality of other software by making the data formats from one package usable as input to another.  In SASSIE’s case, the molecular dynamics package being used is NAMD (from the Theoretical and Computational Biophysics Group at UIUC), along with scattering calculators Cryson and Crysol.  The Hydropro package is used for calculating hydrodynamic properties.  (Note that Cryson, Crysol, and Hydropro are not open source programs.  Boo.)

SASSIE isn’t quite a full-bore open source project yet.  I was able to download the source here (159 MB download), but there’s a registration barrier in the way on the SASSIE trac site.  I’m not sure why the small-angle scattering community locks up their code like this.  It discourages re-use, and doesn’t provide any extra benefit to the authors of the code or the home institution.  Likewise, Cryson and Crysol both appear to have an academic and research license (again, not open source).

SASSIE looks interesting.  I’m not in the small-angle scattering community, but I really like the bridge between atomistic simulation and experiment, and the code looks like it has some very useful pieces that could be reused in interesting ways.

Jmol goes JavaScript

JmolAbout 10 years ago, I turned the Jmol project over to a series of fantastic lead developers (Jmol programmers regenerate in different bodies just like Doctor Who does).  Since then, the aspect of the new work on Jmol that has most delighted me is the Jmol applet, which allows the program to be embedded in web pages.   The Jmol applet transformed Jmol from a niche application into a way of broadly disseminating and interacting with chemical structures.  It has become the de facto standard for showing protein structures at the RCSB Protein Data Bank, as well as in a number of chemistry journals.

The problem with Java applets is that many new tablet and phone web browsers don’t support them (and it looks like Java applets are being slowly disabled on Mac browsers as well).  Java has always been an elegant but relatively heavy-weight solution to dynamic content, but I think its time as a widely-used language for web content is coming to a close.

So, how do you interact with chemical structures without Java or Flash?

The Jmol team has been hard at work converting the Jmol source so that the same source code that produces the Jmol applet can also be run through Java2Script to create the entire Jmol applet in JavaScript.   The new beast is called JSmol and has almost all of the functionality of Jmol itself (file IO, scripting, etc.)    Bob Hanson’s demo pages give a taste of this work.

This is amazing work.  The same Java code is compiled to create the Jmol applet, a WebGL version of JSmol, and the HTML5 version of JSmol that can run on my phone. Almost all of Jmol is there – the ability to display orbitals, crystals, and van der Waals surfaces.  Some menu interaction is still missing, but if the goal is to display and interact with a chemically meaningful structure on a web page,  JSmol looks like a great solution.

The credit for this work largely goes to a number of people:  The GLmol interface was written by Takanori Nakane.  Java2Script was written by Zhou Renjian.  The Jmol code conversion to JavaScript was done by the current Doctor Who,  Bob Hanson.

Octopus – A cool open source TDDFT code

OctopusI just found out about Octopus, a quantum mechanics package that does time-dependent density functional theory (TDDFT) calculations using pseudopotential approximations.

It works in parallel using MPI and OpenMP and scales to tens of thousands of processors. It also has support for graphical processing units (GPUs) through OpenCL.

The Octopus code can be browsed freely, and it has been released under the GPL.

Particularly cool is the ability to use the time dependent electron localization function (TDELF) to look at orbitals dynamically during a chemical reaction.

Computational Chemistry Highlights

Computational Chemistry Highlights (CCH) is an interesting new overlay journal that identifies important contributions to the field of computational and theoretical chemistry published within the last 1-2 years.  I’m involved in this particular overlay journal – I’ll be concentrating on recent developments and papers in molecular dynamics and statistical mechanics.  The journal will eventually get to an editorial board of around 50, so it will help us all keep up on advances in the field that are outside our specific areas of expertise.

Overlay journals are a great concept – CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists.  In addition to highlighting recently-published papers, I’m pretty sure it will also include highlights of non-journal resources like code, publicly available datasets, and papers available on preprint servers (e.g. arXiv, Nature Precedings).  It also allows non-anonymous comments on papers and will let authors respond to those comments.

Overlay journals are an interesting experiment. We’ll have to see how important they become, but I’m pretty happy to be included early on this one.

Advice to junior faculty who want to do get promoted doing Open Science

I recently sent some advice to a colleague who is coming up for tenure at another university.  He’s quite well known in the Open Science community and is trying to figure out how best to make the case to his tenure committee that the open science contributions he has made in addition to his traditional journal publications are important.  We’re talking some major contributions here — lab protocols on OpenWetWare, open lecture materials on slideshare, data files released with CC0, videos of lab protocols on Benchfly, and he’s a regular contributor to science discussions on FriendFeed.

The advice I gave him was basically to make the committee’s job of measuring these contributions easier.  Here’s the advice (in a slightly edited form):

The audience for most tenure documents (and particularly the external letters) is a committee of non-specialists that advises the provost or other high-ranking administrator.  These committees are often somewhat skeptical of departments and candidates and are looking for external validation of what they are reading in the tenure dossier and the packet prepared by the departments.  They are swayed by real experts in the field (named chairs at other institutions, national academy members, people at top 10 institutions) and by things they can measure (publications, h-indices, grant money, citation counts). If you want to add a non-traditional contribution to a tenure dossier, you should also include a way of measuring the importance of that contribution.

First, if the rules of your institution allow it, make sure there is a strong defense of open ways of doing science in your dossier (1-2 paragraphs or so).  Make the case that it is important to consider non-standard contributions even though previous tenure committees did not.

Use as many metrics to back up your contributions as you can. Make a case that each of your software releases counts as much as a full publication, and use download statistics as if they were directly comparable to academic citations.  List external users of your software as if they were research collaborators, because they are!  If you can collect them, include download statistics on open contributions to sites like OpenWetware and Wikipedia.

If your institution’s rules allow it, make sections directly under your publications for  ”Published Datasets”, “Contributed Software”, “Published Protocols & Notebooks”, “Scientific Videos”.  In each section, list authors, a title, description, and URL of the resource you have contributed along with a count of downloads or views, and a list of other groups using your data. Make this look as much like your publication section as possible, as you can then make the argument that these things should be treated with a similar weight to traditional academic publication.   Provide the metrics in the document so that your committees aren’t guessing about how important something is.  I can’t emphasize this enough – citation counts are easy for a committee to dig up – download stats are harder.  Do the measurement work for your committee and they’ll make the assumption that your metrics are important.

So that’s the advice.  I’ve been involved in a few internal tenure discussions, and the metrics are always important.  If there isn’t an easy analogy to something in my own experience, I look to the candidate’s documents and the external letters to tell me why something matters.