# DAKOTA

Dakota is a freely available software framework for large-scale engineering optimization and uncertainty analysis. The Dakota toolkit provides a flexible, extensible interface between analysis codes and iterative systems analysis methods. Dakota contains algorithms for:

• uncertainty quantification with sampling, reliability, stochastic expansion, and epistemic methods;
• parameter estimation with nonlinear least squares methods; and
• sensitivity/variance analysis with design of experiments and parameter study methods.

# Itzï

Itzï is a hydrologic and hydraulic model that simulates 2D surface flows on a regular grid using simplified shallow water equations. It uses GRASS GIS as a back-end for reading entry data and writing results. It simulates surface flows from direct rainfall or user-given point inflows, and uses raster time-series as entry data, allowing the use of radar rainfall or varying friction coefficients.

Itzï is developed by Laurent Courty at the engineering institute of the National Autonomous University of Mexico.

# Open Science Codefest

The National Center for Ecological Analysis and Synthesis (NCEAS) at UCSB is co-sponsoring the Open Science Codefest 2014, which aims to bring together researchers from ecology, biodiversity science, and other earth and environmental sciences with computer scientists, software engineers, and developers to collaborate on coding projects of mutual interest.

Do you have a coding project that could benefit from collaboration, or software skills you’d like to share? The codefest will be held from September 2-4 in Santa Barbara, CA.

Inspired by hack-a-thons and organized in the participant-driven, unconference style, the Open Science Codefest is for anyone with an interesting problem, solution, or idea that intersects environmental science and computer programming. This is the conference where you will actually get stuff done – whether that’s coding up a new R module, developing an ontology, working on a data repository, creating data visualizations, dreaming up an interactive eco-game, discussing an idea, or any other concrete collaborative goal that interests a group of people.

Looks like a great program!

# Stripe’s Open Source Retreat

The Open-Source Retreat that is being sponsored by stripe looks quite intriguing.  Stripe relies on a lot of open source software, and they’ve announced a program to give a grant to a small number of developers to come to San Francisco to work full-time on an open-source project for a period of 3 months. The awardees will have space in Stripe’s SF office, and will be asked to give a couple of internal tech talks over the course of the program, but otherwise it’ll be no-strings-attached.

This is a clever model for supporting open source development, and I hope this idea catches on with other companies that benefit from open source. I can think of a number of academic developers who would love the idea of a sabbatical to work on an open source code project, to meet new people who might use their code, and to get a fresh perspective in new surroundings – an open source sabbatical.  This could be a great way for companies that benefit from open source scientific software to help encourage and influence the development of the tools they use.

The deadline for applying to the Stripe program is May 31st, and the program will run from September 1st through December 1st.

# New PLOS Open data policy

PLOS has announced some changes to their publishing policies, and these changes are great news.  The new PLOS policies will go a significant way towards encouraging open data and open source.  Although the announcement itself is somewhat vague on the subject of source code, the actual PLOS One Sharing Policy is excellent:

…if new software or a new algorithm is central to a PLOS paper, the authors must confirm that the software conforms to the Open Source Definition, have deposited the following three items in an open software archive, and included in the submission as Supporting Information:

• The associated source code of the software described by the paper. This should, as far as possible, follow accepted community standards and be licensed under a suitable license such as BSD, LGPL, or MIT (see http://www.opensource.org/licenses/alphabetical for a full list). Dependency on commercial software such as Mathematica and MATLAB does not preclude a paper from consideration, although complete open source solutions are preferred.
• Documentation for running and installing the software. For end-user applications, instructions for installing and using the software are prerequisite; for software libraries, instructions for using the application program interface are prerequisite.
• A test dataset with associated control parameter settings. Where feasible, results from standard test sets should be included. Where possible, test data should not have any dependencies — for example, a database dump.

However, the one loophole is that they allow for code that runs on closed source platforms in “common use by the readership”  (e.g. MATLAB), although it must run without dependencies on proprietary or otherwise unobtainable ancillary software.  That “common use” loophole could potentially be a mile wide in some fields.  Is Gaussian a common use platform in computational chemistry and therefore exempt from this new policy?   If so, the policy is a bit toothless.  I’d like to see the limits and bounds of the “common use” loophole more clearly stated.

The announcement makes PLOS ONE a much more attractive place to send our next paper.

# The RosettaCon 2012 Collection: Rosetta Developers Meet the Challenges in Macromodeling Head On

Reproducibility continues to be one of the major challenges facing computational biologists today. Complicated experiments, massive data sets, scantily described protocols, and constantly evolving code can make experimental documentation and replication very difficult.  In addition, the need for specialized knowledge and access to large computational resources can create barriers when trying to design and model macromolecules.

Every year, the Rosetta developer community meets to discuss these challenges and advancements via Rosetta, a software suite that models and helps design macromolecules. In 2010, PLOS announced the RosettaCon2010 Collection, which made the latest research on protocols used to create macromolecular models available to all. Now, the PLOS ONE RosettaCon 2012 Collection continues to tackle issues related to use, reproducibility and documentation by highlighting new scientific developments within the Rosetta community.

The RosettaCon 2012 Collection comprises 14 articles detailing the scientific advancements made by developers that use Rosetta. In order to address reproducibility and documentation challenges, each article within this Collection includes an archive containing links to the exact version of the code used in the paper, all input data, links to external tools and example scripts.

This year’s Collection marks the tenth anniversary of RosettaCon and focuses on three long-term goals of the community: increase the usability of Rosetta, improve its current methods, and introduce completely new protocols.

Increasing the usability of Rosetta – Rosetta still requires specialized knowledge and large computational resources, but this collection features two articles describing advancements that make it easier for non-experts to use its applications. These articles introduce the Rosetta Online Server that Includes Everyone (ROSIE) workflow, which allows for rapid conversion of Rosetta applications into public web servers, and PyRosetta, a new graphical user interface (GUI) which allows users to run standard Rosetta design tasks.

Improving current prediction methods – Several articles describe improvements to Rosetta’s structure prediction capabilities and design methodologies. Some examples include improvements to loop conformational sampling, and a recently developed ray-casting (DARC) method for small molecule docking now enables virtual screening of large compound libraries.

Introducing new protocols – A number of articles featuring new procedures and applications that debuted at the conference are introduced in the Collection. Highlights include new methods for dealing with ligand docking, advancements to pre-refine scaffold proteins prior to computational design of functional sites, and new protocols to drive Rosetta de novo modeling.

The RosettaCon 2012 Collection continues to help serve the Rosetta community in an effort to ensure that newly developed protocols are as usable as more established workflows, are transparent, and are accurately documented even in an active development environment.

This post has been adapted from “The RosettaCon 2012 Special Collection: Code Writ on Water, Documentation Writ in Stone” which serves as a more in-depth overview of the new collection. To read all that this Collection has to offer, click here.

# OpenAPIs for scientific instrumentation?

An interesting question from Dale Smith:  Are there OpenAPIs for remote sensing and monitoring of scientific instruments?  Dale pointed us at this very cool RSOE EDIS alert map as an example of what could be possible with distributed consumer-grade sensors that had OpenAPIs.   I can imagine a number of very cool things that could be done with distributed weather or earth motion sensors.  Are there software tools out there that make querying these sensors easy?

(One suggestion,  however, would be for the RSOE EDIS to look for a slightly less ominous-sounding motto).

# Playing with MultiGraph

I’ve been playing around with a cool JavaScript library called MultiGraph which lets you interact with graphical data embedded in a blog post.   The data format is a simple little xml file called a “MUGL“.   Here’s a sample that took all of about 10 minutes to create:

Note that you can pan and zoom in on the data.   For those readers who are interested, this data is the Oxygen-Oxygen pair distribution function, $$g_{OO}(r)$$, for liquid water that was inferred from X-ray scattering data from  G. Hura, J. M. Sorenson,  R. M. Glaeser, and  T. Head-Gordon, J. Chem. Phys. 113(20), pp. 9140-9148 (2000).

Inserting this into the blog post involved uploading two files, the javascript library itself and the MUGL file. After those were in place, there were only two lines that needed to be added to the blog post:
 <script type="text/javascript" src="http://www.openscience.org/blog/wp-content/uploads/2013/05/multigraph-min.js"></script> 
 <div class="multigraph" data-height="300" data-src="http://www.openscience.org/blog/wp-content/uploads/2013/05/gofrmugl.xml" data-width="500"></div> 

One thing that would be nice would be a way to automate the process of going from an xmgrace file directly to the MUGL format.

# SimThyr – simulation software for pituitary thyroid feedback

This is a bit outside our normal area of expertise, but it looks interesting.

Thyroid hormones play an important role in metabolism, growth and differentiation. Therefore, exact regulation of thyroid hormone levels is vital for most organisms. The mechanism for the feedback control known, but the dynamics are still a bit of a mystery.  There’s an interesting page on the different models for thyrotropic feedback control at the Midizinische Kybernetic (Medical Cybernetics) site.  SimThyr is an open source Pascal-based simulation program for the pituitary thyroid feedback control mechanism that explores these models and makes predictions for dynamics based on parameters of the feedback mechanism.

# Reversible Random Number Generators

This news comes by way of John Parkhill, my new colleague here at Notre Dame.

William G. Hoover (of the Nosé-Hoover Thermostat) and Carol G. Hoover issued a $500 challenge on arXiv to generate a time-reversible random number generator. The challenge itself would be quite remarkable news. What’s even better is that the challenge (including the source code for an implementation) was solved in 6 days by Frederico Ricci-Tersenghi. Why is this a big deal? Most of the equations in physics that govern time evolution of particles obey time-reversal symmetry; the same differential equations that govern molecular or planetary motion will take you back to your starting point if you suddenly reverse the time variable. This is a usually a fantastic way to check to see if you are doing the physics correctly in your simulations, and also means that collections of starting points that are related to each other behave in certain predictable ways when they evolve. Stochastic approaches to physical motion introduce an aspect of randomness to mimic the behavior of complex phenomena like the motion of solvent surrounding the molecule we’re interested in, or to mimic the transitions between different electronic states of a molecule. The introduction of random numbers has meant we had to give up time-reversibility, and we’ve been willing to live with that for a long time because we can study more complicated phenomena. If we have access to a time-reversible pseudo-random number generator, however, we get that very powerful tool back in our toolbox. Now, the Langevin equation, $$m \frac{d^2 x}{dt^2} = F – \gamma(t) \frac{dx}{dt} + R(t)$$ has two things that prevent it from being time-reversible. Besides the stochastic or random force, $$R(t)$$, there’s also a drag or friction force, $$-\gamma(t) \frac{dx}{dt}$$, that depends on the velocities of the particles. There’s no solution yet to time reversibility for this piece (and I have my doubts that there ever will be a way to reverse this). I suppose if we offer up another$500 prize for time-reversible drag, we’d make some traction on this problem…

(The comic above courtesy of xkcd).

# Relax – Molecular dynamics by NMR data analysis

Edward d’Auvergne pointed out the relax program, which looks like a useful way to connect experimental NMR spectra with molecular dynamics simulations.

relax is designed for the study of molecular dynamics of organic molecules, proteins, RNA, DNA, sugars, and other biomolecules through the analysis of experimental NMR data. It supports exponential curve fitting for the calculation of the R1 and R2 relaxation rates, calculation of the NOE, reduced spectral density mapping, the Lipari and Szabo model-free analysis, study of domain motions via the N-state model or ensemble analysis and frame order dynamics theories using anisotropic NMR parameters such as RDCs and PCSs, and the investigation of stereochemistry.

# Do.abl.es

Do.abl.es

Do you want to know you can measure DNA contour lengths using ImageJ?  Perhaps you want to stain a C. Elegans embryo for imaging?  Or possibly, you might want to test whether or not you have gotten an immune response using ELISA?

Martin Fitzpatrick sends word of a cool collection of open access scientific protocols called Do.abl.es.  For the uninitiated, protocols are the recipes that scientists use to carry out experiments in a reproducible way.  The list of protocols posted to Do.abl.es to date has a number of interesting and important biochemistry and biology experiments.

There’s also a neat companion site called Install.abl.es which concentrates on many of the same things we do – the use of open source software in the sciences.

# Overture – A C++ toolkit for Solving PDEs in Complex Geometries

This looks useful!   The partial differential equations (PDEs) we solve in my lab are the equations of motion for atoms in molecular dynamics.  These are relatively easy to integrate numerically.  Lots of labs work with harder PDE problems  (like the response of metallic nanostructures to electromagnetic fields) that have difficult boundary conditions in complex geometries.   Overture is an object-oriented code framework for solving partial differential equations (PDEs). It provides a portable, flexible software development environment for applications that involve the simulation of physical processes in complex moving geometry . It is implemented as a collection of C++ libraries that enable the use of finite difference and finite volume methods at a level that hides the details of the associated data structures. Overture is designed for solving problems on a structured grid or a collection of structured grids. In particular, it can use curvilinear grids, adaptive mesh refinement, and the composite overlapping grid method to represent problems involving complex domains with moving components. There are also utilities for   building grids on CAD geometries and for building hybrid grids that can be used with applications that use unstructured grids.

# SASSIE – Create atomistic models from Small Angle scattering data

Here’s a neat bit of “bridge” or “glue” software for today – SASSIE is a python-based suite for creating atomistic models of molecular systems in order to compare those models directly to data from small-angle X-ray scattering (SAXS) and small-angle neutron scattering (SANS) experiments.  SASSIE is the work of Joseph Curtis and Susan Krueger from the NIST Center for Neutron Research.  You can use SASSIE to generate and manipulate large numbers of structures and to calculate the SANS, SAXS, and neutron reflectivity profiles from atomistic structures that result from molecular dynamics (MD) or Monte Carlo (MC) simulations.

“Bridge” or “glue” software increases the functionality of other software by making the data formats from one package usable as input to another.  In SASSIE’s case, the molecular dynamics package being used is NAMD (from the Theoretical and Computational Biophysics Group at UIUC), along with scattering calculators Cryson and Crysol.  The Hydropro package is used for calculating hydrodynamic properties.  (Note that Cryson, Crysol, and Hydropro are not open source programs.  Boo.)

SASSIE isn’t quite a full-bore open source project yet.  I was able to download the source here (159 MB download), but there’s a registration barrier in the way on the SASSIE trac site.  I’m not sure why the small-angle scattering community locks up their code like this.  It discourages re-use, and doesn’t provide any extra benefit to the authors of the code or the home institution.  Likewise, Cryson and Crysol both appear to have an academic and research license (again, not open source).

SASSIE looks interesting.  I’m not in the small-angle scattering community, but I really like the bridge between atomistic simulation and experiment, and the code looks like it has some very useful pieces that could be reused in interesting ways.