Challenges in Creating a Big Data Environment for Supercomputers (Dr. James Maltby, Cray, Inc.)

<div>Modern supercomputers like Shaheen II at KAUST are specially designed and architected for large-scale parallel simulation workloads such as computational fluid dynamics or weather forecasting.&#160; However, many Big Data Analytics algorithms and workloads can be run very effectively on supercomputers, once the underlying differences in computer architecture and programming models are taken into account.&#160; In this talk we will discuss the fundamental architectural differences between ordinary servers and clusters where many analytics codes are developed, and supercomputers.&#160; We will then discuss how this applies to specific applications, including a large-scale semantic graph (RDF) database and more general graph algorithms.&#160; We will look at parallel analytics packages such as Spark and R, and large-scale Deep Learning training with TensorFlow.&#160; Finally, we will cover the performance that can be obtained through this approach.</div>

Speakers

James Maltby

Cray, Inc