Computational speed is essential for scientists wanting to model or verify experimental systems in biology, climate and chemistry. The newest supercomputer at the Environmental Molecular Sciences Laboratory packs 20 times the processing power of its predecessor. But speed alone isn’t enough to tackle the complex computational problems. Scientists are developing methods for fine tuning computing power to better parse chemical and biological reactions at the atomic level. Researchers can then apply that information, with more accuracy and predictability, to bigger scientific challenges.
“This is a critical issue,” said Tim Scheibe, lead scientist for multi-scale modeling and high performance computing at EMSL. “While fundamental molecular science is our focus, we need to increase the impact of that science by making it relevant to large-scale questions about things like climate change, energy production and storage, or contaminated soil remediation.”
To increase the impact, supercomputers can help scientists narrow the gap between theoretical predictions and experimental results. If computations with EMSL’s new machine, called Cascade, can provide more accurate details about chemical and biological reactions – calculating precisely when, where and how electrons, atoms and molecules move around – then the information can improve experimental design and efficiency, pushing the pace of the scientific process.
“One of the unique things about EMSL is we don’t just give you one instrument to work with, we give you the ability to integrate physical experiments with computational modeling to enhance research results,” Scheibe said.
The Need for Speed
Cascade is rated among the top 20 fastest computers in the world, with a 3.4 petaFLOPS rating. Each “FLOPS” (Floating-point Operations Per Second) is one math operation per second, and a “peta” is a quadrillion, Scheibe explained. So, the computer can perform more than one million, billion math operations every single second.
“Speed is an enabling thing,” Scheibe said. For example, to optimize an organism for biofuel production, researchers could change one gene at a time and run 1,000 experiments, or run 1,000 simulations on the computer to get 10-15 best candidates for the actual experiment.
Computers with that kind of speed need a radically different design.
“Engineering has hit a wall,” said Karol Kowalski, computational chemistry expert at EMSL. “They can’t make faster chips, so they add more. But instead of stacking more chips on each other, they use nodes with accelerator cards attached.”
This new architecture can’t support the same old software. So, EMSL’s scientists developed new implementations in NWChem, a computational chemistry software code that gives researchers a suite of tools to help simulate molecular processes at the atomic level. To keep up with Cascade, the NWChem team, in collaboration with Intel, just issued NWChem 6.5. This new release enables users to take advantage of accelerators available on the new machine. (Read about the release of NWChem 6.5)
Harnessing the speed of computers to refine experimental results has been a career-long approach for David Dixon, the Robert Ramsay Chair of chemistry at The University of Alabama.
Decades ago, when mainframe computers took up entire buildings, he convinced DuPont computer modeling could save the company millions of dollars by predicting the properties of chemical compounds to minimize the costs of experiments. Now, the former associate director of theory, modeling and simulation of EMSL, leads a Department of Energy-funded effort on biofuels and industrial chemical intermediates. This research uses one of the largest time allocations on Cascade and has already produced nearly 20 publications.
“Our team is trying to come up with new materials to catalyze chemical reactions with more control and less byproduct formation. This is a broad, long-term project to understand how biomass can most efficiently be converted to biofuel and valuable chemical intermediates,” Dixon said.
One quest for catalysis researchers is to find cheap, abundant elements to use – manganese or nickel – instead of expensive materials like platinum. To effectively use these “first row” transition metals, so called for their location in the periodic table, Dixon wants to understand precisely how atoms are bonding in specific reactions, and how much energy is needed to do so. With this knowledge, he can modify catalytic processes in the computer to model faster reaction times and avoid unwanted byproducts.
With the computational resources at EMSL, Dixon and his team have worked out one part of the puzzle: how to convert alcohols into more useful products. By understanding alcohol conversions, eventually they can more efficiently convert sugars. Sugars are the building blocks of cellulose, a key biomass feedstock for biofuels.
“The goal is to understand the mechanism and find the governing chemical descriptors,” said Dixon. “We can’t get it just from experiments. We need to combine the experimental with computation to really get at the reaction mechanism at the molecular level. The experiments are critical as they benchmark our calculations and provide key constraints on the mechanisms.”
The calculations are so large it takes the speed of Cascade, and NWChem software, to get the high level of accuracy needed to make reliable predictions of those mechanisms.
While some accuracy is derived from speed, some of it also comes from improvements in the way information is processed and calculations are performed. Even supercomputers are only as good as the slowest computational step.
“Sometimes people have the impression only the computers are getting faster. But algorithms and methods are getting better too,” said Don Truhlar, a theoretical and computational chemist at the University of Minnesota.
One of the complex problems Truhlar’s group tackled is the computational treatment of charge transfer and charge transport in photoactivated systems. In the same way Dixon tracks molecular bonding, Truhlar wants to account for electron movement in light-activated systems; information fundamental to understanding, and then manipulating, solar energy and other photoactivated catalytic processes.
In order to treat charge transport and charge balance, Truhlar needs to know where the charge is to begin with, and then how it moves. Although conventional simulation methods can predict charge movement in systems with only a few electrons, these methods breaks down in more complex systems where electrons and charged functional groups can exist in multiple excited states.
Truhlar and his group are creating methods to account for more complex electronic behaviors – such as multiple possible orbits and “tunneling” through boundaries – that exceed the accuracy of existing practical computational methods.
By “bootstrapping,” Truhlar benchmarks the accuracy of new treatments against smaller systems, and then applies them to bigger systems. The NWChem software is an important part of this research because it has modules to do the required accurate, high-level benchmarking calculations on Cascade. As new methods and algorithms are validated, Truhlar and his group are working with EMSL Scientist Niri Govind to integrate them into NWChem software.
While Truhlar is focused on developing tools for chemical computation, Bill Cannon, a senior scientist at Pacific Northwest National Laboratory, is modifying NWChem codes for applications in biology.
With a background in statistical thermodynamics and enzymology, Cannon recognized the mathematics he used to simulate protein metabolism could also be applied to modeling metabolic pathways. With more accurate thermodynamic models of these pathways, he ultimately wants to simulate the whole system of metabolic reactions – the mass action kinetics – in a single organism, and then scale it up.
“Only one group has ever been able to do that, and even then it was with tremendous effort and approximations,” Cannon said. Right now, most models are based on conservation of mass; balancing amounts of “ingredients” against the end products in a particular reaction. But without accounting for energy and kinetics, the models are missing key statistical descriptions that affect whether reactions are even possible.
The conventional approach to simulating dynamics and predicting metabolite levels relies on having reaction rate constants, a difficult parameter to determine that varies tremendously with each enzyme-catalyzed reaction. But Cannon is “turning the usual approach to kinetics on its head.” He sets metabolite levels and then lets the supercomputer find the rates, which makes the entire process faster. His group has already successfully simulated small pathways, such as the tricarboxylic acid cycle, a common metabolic route many organisms use to generate energy.
With more accurate modeling, researchers could simulate things such as: what happens when a vitamin gets added to growth medium for bacteria, or whether microbes still grow after one gene gets knocked out.
“There’s a lot of unintended consequences from engineering microbes. We need something that will better predict what happens before an actual experiment,” Cannon said. He estimates they’re a year away from getting a large scale model of an organism, and maybe five years from mapping out a complex microbial community.
“Everybody asks how this will impact applications such as engineering for biofuels,” Cannon said. “But the biggest breakthrough will be the capability to run dynamic simulations without using rate constants. That’s much more fundamental.”
Although Cascade has been operational for users less than a year, EMSL is already planning a more integrated approach to bringing the next supercomputer on campus. In the past, when older computers were dismantled to make way for the new, scientists inevitably wanted to keep parts of the more familiar instrument. To smooth out transitions between machines, when Cascade came in, EMSL factored in ways to overlap with a new system every two years.
“This new approach will also give us a chance to try different things. It’s not always about speed,” Scheibe said. “Sometimes there’s greater efficiency in building more memory or disk space. Sometimes different computer architectures suit one kind of science more than another.”
Those kinds of differences were discussed at a workshop Scheibe recently co-organized for connecting computational models across scales and science disciplines. In the future, supercomputers could be linked together; perhaps connecting a machine good for fast mathematical computations (Cascade) to an instrument better suited to sorting large data sets. Running these machines simultaneously could get answers faster and avoid time-consuming data transfer.
As scientific imaging becomes increasingly sophisticated, this parallel data processing will become more important. For instance, when EMSL’s new dynamic transmission electron microscope comes online, it will generate terabytes of data every second. If information can be processed in real time, then the storage crisis could be avoided.
With each generation of supercomputers, EMSL users get another opportunity to rev up molecular research with innovations to help answer larger scientific questions.
Elizabeth Devitt is a science journalist and freelance writer.