***MORNING TALKS & WORKSHOPS 10:30am to 12.30pm ***
These talks will provide an overview of open source data science tools and free datasets that will be available to all participants on Windows Azure
1. Big workloads on Windows Azure (Richard Conway, Andy Cross, @Elastacloud)
Setup a Windows Azure subscription, use VM images and your free time to with an Extra Large VM instance to do more work than your laptop can handle. A first look at HDInsight on Azure as well, the new Microsoft Cloud-based Hadoop distribution.
3. How to deploy the IPython Notebook on Microsoft Azure, using Linux or Windows Virtual Machines (Wenming Ye, Microsoft)
In this tutorial we will demonstrate how you can create your own IPython notebook with scikit-learn on Windows Azure and start being productive immediately.
***AFTERNOON TALKS & WORKSHOPS 2pm to 6pm***
1. Data Science Programmability Made Scalable and Simple (Don Syme, @Microsoft Research)
Productive Data Science is powered by productive Data Programmability”. In this talk, we’ll examine a key recent breakthrough in data programmability with both big data (many rows) and broad data (big schemas). The modern web and enterprise is massively information rich, but programming languages have traditionally been information sparse. The innovative F# language from Microsoft is leading the way in this area. We’ll look at how F# 3.0 Information Rich Programming allows strongly-tooled programming languages to scale to internet-scale information sources like Google Knowledge Graph, web data markets, databases, Big Data systems, services and enterprise data schemas.
This talk will be of interest to everyone who works with big and broad data schemas. Even if you don’t program, come along and see how your data scientists can be empowered to compose and deploy analytical components to critical production scenarios.
2. Machine Learning: From Kinect to Controlling HIV (Dr. Kenji Takedi, @Microsoft Research)
Machine learning is transforming the ability of computers to contribute to society, from the family and living room to global challenges such as climate change and healthcare. The diverse range of machine learning techniques and applications is having a profound effect on the way we use computing in daily life, with an enormous future potential as the volume of data produced by people, devices and services increases exponentially. In this talk we discuss how Microsoft Research is applying machine learning in such diverse areas as body tracking for Xbox Kinect, and using spam detection techniques in the quest for an HIV vaccine.
3. Apache Hive and Stinger (Chris Harris, EMEA Solution Engineer @Hortonworks)
Apache Hive is Hadoop’s SQL-like interface, used for reporting and analysis over huge volumes of data. Hive was released by Facebook in 2009 and is now used there to run more than 60,000 queries per day over more than 100 petabytes of data. Hundreds of companies use Hive in production for its reliable data processing and unmatched scale. Community activity in Hive is greater than ever before and 2013 is full of exciting new developments for Hive in both performance and analytics capabilities. Come to this session to:
* Learn about how “Project Stinger” will achieve its goal to make Hive 100x faster than it has been in the past, enabling both more scalable analytics and human-time query
* Learn about Hive’s new analytical capabilities, windowing functions and standard SQL datatype
4. Two Timeless Techniques for Exploratory Data Analysis (Jonny Edwards, Data Scientists @ Thoughtful Technology)
Back in the old-days unsupervised learning was called “Exploratory Data Analysis” and consisted of a collection of techniques for visualising and summarising data. Two methods have stuck with me in my work, as they are simple to apply and produce intuitive results, these are K-means Clustering and Multi-Dimensional Scaling. Thankfully both R and Python Scikit-learn support these approaches, and in this talk I will run through their application, and interpretation, on a non-trivial data-set. In addition to this, I will compare the results to some of the newer methods that perform similar tasks, and show you that the old approaches still have a few advantages.