Last month, hundreds of members of the Apache Flink and stream processing communities gathered at Flink Forward San Francisco. The data Artisans team would like to again thank all attendees, speakers, the program committee, and sponsors for making the conference a success.
In case you missed this year’s Flink Forward SF, catch up on the session recordings and corresponding slides here.
This post highlights a handful of sessions on some of the trends we observed at the conference:
- Flink’s Impressive Scale
- New Use Cases
- Platformization and Optimization
- Future of Stream Processing
- Flink Community Growth
A number of talks shared incredible numbers with respect to the scale of their applications. They show that Apache Flink is able to power some of the most challenging stream processing use cases we’ve learned about so far.
- Alibaba runs their Stream Compute Platform with Blink (which is based in Flink) with 1000s of jobs, on 100,000s of cores and 10s of TBs of state inside the stream processor!
- Netflix processes 3 trillion (3,000,000,000,000) events per day of their firehose, and described their tuning efforts to run a job with 20 TB state in the stream processor.
New Use Cases
The two of the most new frequent use cases that popped up this year were Machine Learning Pipelines and Stream Processing with SQL:
- Dave Torok and Sameer Wadkar from Comcast presented their pipeline to apply machine learning models with dynamic way of assembling features that go into the models.
- Fabian Hueske and Timo Walther from data Artisans demoed the first version of SQL client, showed some use cases Flink SQL was designed for, and explained why unified batch and stream processing is important and what it means to run SQL queries on streams of data.
- Aris Koliopoulos and Alex Garella from DriveTribe showed how far you can push the event sourcing paradigm in "Panta Rhei: designing distributed applications with streams" by sharing how to implement a social network in that way.
- Xingzhong Xu shared how the team at Uber uses a hybrid offline/online approach to machine learning in their Real-time Optimization infrastructure.
Platformization and Operations
- Robert Metzger and Patrick Lucas from data Artisans demonstrated dA Platform, a production-ready platform for stream processing with Apache Flink. They showcased some functions of the platform and discussed how it makes stream processing easier for enterprises.
- Shuyi Chen and Rong Rong presented the design and architecture of the Flink As a Service platform at Uber. They discussed how their team manages the deployment, makes the platform highly available to support critical real-time business, scales the platform to support the entire company, and runs the platform in production.
- Jinkui Shi and Radu Tudoran showed how the team at Huawei Cloud built a Flink real-time analysis in CloudStream Service.
Future of Stream Processing
- Anand Iyer from Google Cloud talked about opening up the stream processing ecosystem to further languages (Python, Go, etc) and use cases (such as Machine Learning Pipelines with TensorFlow): In collaboration with members of the Apache Flink community, the Apache Beam project is building a generic language portability layer and extends its Flink runner to support that framework, which gives users the ability to run the new tf.transform and tfx libraries for pre-/post processing of data in TensorFlow pipelines.
- Gregory Fee form Lyft talked about the state of Unified Batch and Stream processing at the example of state bootstrapping, and what steps are still missing to make this unification complete.
- Flavio Junqueira from Dell EMC and Till Rohrmann from data Artisans demonstrated the state of art in Pravega (Stream Storage) and Apache Flink. We learned that elastic auto-scaling for streaming storage and stream processing is one step towards creating a truly self-managing stream processing infrastructure.
As always, we were happy to see many Flink users, contributors, and committers coming together to share their Flink stories. Flink Forward San Francisco 2018 gathered a large number of attendees to discuss their use cases and experiences of running Flink in production. At the end of the official program, four committers entered the stage for an Ask-Me-Anything session and talked about the future and roadmap of Flink.
We’ll be announcing the Call for Presentations for Flink Forward Berlin 2018 soon. Follow @dataArtisans to be notified when you can submit a proposal.
Once again, big thank you to our sponsors!