The need to analyze massive scientific data sets on the one hand and the availability of distributed compute resources with an increasing number of CPU cores on the other hand have promoted the development of a variety of languages and systems for parallel, distributed data analysis.
In this talk we argue that both integrating existing tools and libraries and expressing complex workflows in a functional programming model is a necessity in contemporary languages for distributed computing.
We demonstrate the usefulness of these features by the example of Cuneiform, a minimal functional language for large-scale scientific data analysis running on the Erlang VM. We discuss applications in bioinformatics and machine learning.
Jörgen Brandt is a PhD student at the Humboldt-Universität in Berlin since 2013. His research interests include next generation sequencing, scientific workflows and functional programming languages. He graduated in Computer Science with a specialization on intelligent systems at Technische Universität Berlin in 2011 and in Information Technology and Networked Systems at Hochschule für Technik und Wirtschaft in 2008.