Spark Summit East Agenda (New York, March 18-19 2015)
The plenary and track sessions are on day one. Databricks is offering three training courses on day two.
The track sessions were divided into developer, applications and data science tracks. To assist you in finding your favorite speakers, I have collapsed that listing and sorted it by the first listed speaker’s last name. I certainly hope all of these presentations will be video recorded!
Take good notes and blog about your favorite sessions! Ping me with a pointer to your post. Thanks!
- Spark User Concurrency and Context/RDD Sharing at Production Scale by Farzad Aref (Zoomdata), Justin Langseth (Zoomdata)
- Beyond SQL: Spark SQL Abstractions For The Common Spark Job by Michael Armbrust (Databricks)
- Practical Machine Learning Pipelines with MLlib by Joseph Bradley (Databricks)
- When Spark meets Baidu by Hua Chai (Baidu, Inc)
- Accumulo and Spark: Geospatial processing with more distribution, less shuffle by Eugene Cheipesh (Azavea, Inc.), Rob Emanuele (Azavea, Inc.)
- Recommendations in a Flash: How Gilt Uses Spark to Improve Its Customer Experience by Zachary Cohn (Gilt Groupe)
- Experience and Lessons Learned for Large-Scale Graph Analysis using GraphX by Jason (Jinquan) Dai (Intel)
- Next-Generation Genomics Analysis Using Spark and ADAM by Timothy Danford (AMPLab, UC Berkeley)
- Spark Streaming – The State of the Union and the Road Beyond by Tathagata Das (Databricks)
- GraphX: Graph Analytics in Spark by Ankur Dave (UC Berkeley)
- Streaming Big Data Analytics with Team Apache: Spark & Spark Streaming, Kafka and Cassandra by Helena Edelson (DataStax)
- Visualizing big data in the browser using Spark by Hossein Falaki (Databricks)
- Streaming machine learning in Spark by Jeremy Freeman (HHMI Janelia Research Center)
- Spark Plugs Into Your Car by Arpan Ghosh (Automatic), Rob Ferguson (Automatic)
- Spark Application Carousel: Highlights of Several Applications Built with Spark by Vida Ha (Databricks)
- Plot all the data – Interactive visualization of massive datasets by Rob Harper (Oculus Info), Nathan Kronenfeld (Oculus Info)
- Towards Modularizing Spark Machine Learning Jobs by Lance Co Ting Keh (Box)
- Using Spark and Elasticsearch for real-time data analysis by Costin Leau (Elasticsearch)
- Spark’ing an Anti Money Laundering Revolution by Katie Levans (Tresata)
- HeteroSpark: A Heterogeneous CPU/GPU Spark Platform for Deep Learning Algorithms by Peilong Li (U of Massachusetts Lowell), Yan Luo (U of Massachusetts Lowell)
- Functionality and Performance Improvement of SparkR and Its Application by Hao Lin (Purdue University), Haichuan Wang (U of Illinois at Urbana-Champaign)
- Un-collaborative filtering: Giving the right recommendations when your users aren’t helping you by Leah McGuire (Salesforce)
- Interactive Scientific Image Analysis and Analytics using Spark by Kevin Mader (ETH Zurich / Paul Scherrer Institut)
- Distributed Graph-Based Entity Resolution Using Spark by Mahdi Namazifar (Cisco)
- Real-Time Recommendations using Spark by Jan Neumann (Comcast Labs), Sridhar Alla (Comcast Labs)
- Spark Infrastructure for Lumiata’s Probabilistic Graphical Model of Medical Science by Nick Peterson (Lumiata)
- Geospatial and Temporal Analysis and Visualization by Mansour Raad (Esri)
- Estimating Financial Risk with Spark by Sandy Ryza (Cloudera)
- SILK: A Spark Based Data Pipeline to Construct a Reliable and Accurate Food Dataset by Hesamoddin Salehian (Myfitnesspal)
- Finding Shoe Stores in more than 100k Merchants: Using Apache Spark to group all things! by Solmaz Shahalizadeh (Shopify)
- Graph-Based Genomic Integration using Spark by David Tester (Novartis Institutes for Biomedical Research)
- Multi-modal big data analysis within the Spark ecosystem by Jordi Torres (Barcelona Supercomputing Center)
- Power Hive with Spark by Xuefu Zhang (Cloudera), Marcelo Vanzin (Cloudera)
I first saw this in a tweet by Helena Edelson.