Curry On
Amsterdam!
July 16-17th, 2018


Streaming analytics: How to Get Fast Predictions From Real-Time Data with Flink, Kafka, and Cassandra
Bas Geerdink
ING

@bgeerdink

Abstract

Streaming Analytics (or Fast Data) is becoming an increasingly popular subject in enterprise organizations. The reason for this is that customers want to have real-time experiences, such as notifications and advise based on their online behavior and other users’ actions.

In this talk, Bas will present a streaming analytics engine that is powered by Apache Flink and written in Scala. Kafka is used for the message bus and Cassandra for the state management. The machine learning models are made with Knime and Spark, exported to PMML format, and evaluated using the Openscoring.io library. Bas will explain the architecture of the framework, demonstrate how to do the setup and integration of Flink jobs and show code examples of typical streaming concepts such as event time, windows, watermarks, and exactly-once processing.

After this session, the attendees will have a good overview of a typical streaming analytics solution and have a better understanding of Apache Flink as a key data streaming technology. Moreover, concepts that might seem vague and complex have been explained with code examples to lower the threshold for creating fast data applications.

Bio

Bas is a programmer, scientist, and IT manager. At ING, he works as Technology Lead in the global innovation center. Bas has a background in software development, design and architecture with a broad technical view from C++ to Prolog to Scala. He occasionally teaches programming courses and is a regular speaker on conferences and informal meetings, where he brings a mixture of market context, his own vision, business cases, architecture and source code in an enthusiastic way towards his audience.