Call me if you find space whales

CosmoFlow lets astronomers harness the power of deep learning to learn the mysteries of deep space

Jane Elizabeth
© Shutterstock / MarcelClemens

Space is big and getting bigger every second. So, astronomers need a tool that scales well. Introducing CosmoFlow, a TensorFlow-based tool designed to help find dark matter and predict cosmological parameters on supercomputers.

Space is big. According to the expanding universe theory, it’s growing larger and larger every second. So, how do astronomers track all this space? That’s where CosmoFlow comes in.

CosmoFlow is a TensorFlow-based deep learning tool designed to help scientists find and predict cosmological phenomena. Like dark matter, or new stars, or cool space stuff.

This research has been part of a three-way collaboration between Cray, Intel, and the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory (the NESRSC for short). The supercomputers are necessary, if only because of the massive breadth and scale of the data on offer. I mean, it’s only the entire universe.

At the core of the CosmoFlow model are the cosmological parameters used to describe the global dynamics of the universe. Previous work by Carnegie Mellon showed that it was even possible to estimate cosmological parameters directly from the distribution of matter” using 3D convolutional neural networks.

SEE ALSO: NASA’s ten coding commandments

CosmoFlow takes that one step further and uses large datasets of these cosmological parameters. In particular, the three parameters — the density parameter describing the proportion of matter in the universe, matter density fluctuations on scales of 8, and the power law index of the density perturbation spectrum after inflation — require an enormous amount of data and compute power to estimate.

Where else can you get that kind of computing power outside of Cray and Intel?

The experiment was a success: CosmoFlow is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully synchronous training. More importantly, it was able to perform extremely efficiently, with fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance.

As the Cray team points out, this is something that absolutely couldn’t be done without a supercomputer. The full benefits of deep learning come into view when you have access to thousands of nodes. A small, single node system would have needed more than 60 days to run a model. The Cray-Intel-NERSC collaboration took roughly 9 minutes in total… with 8 minutes of training time. Not bad.

Future work on this subject will likely focus on opening up new avenues for exploration of extended cosmological problems. Until then, we’re excited to see how deep learning is applied to the deepest mystery of all: space.

SEE ALSO: brings sharing ML models into the space age

Getting CosmoFlow

Want to search for dark matter on your own private time? While CosmoFlow might not have that many applications for lay developers, it’s sure to be a hit with astronomers and academics alike with that sweet, sweet funding for a Cray supercomputer or two.

CosmoFlow is available on GitHub. However, fair warning, step 1 of the how to guide involves being at the National Energy Research Scientific Computing Center in Berkeley, California. So, uh, good luck with that.

Jane Elizabeth
Jane Elizabeth is an assistant editor for

Inline Feedbacks
View all comments