IMG_0160_1SONY DSCIMG_0036_1IMG_0037_1SONY DSCIMG_0148_1

April 13 & 14, A fun data science weekend!


Assemble a team.Learn & Share. Make friends.

Data Science and Data Visualization Competitions

Workshops & talks on Big Data and Data Science

Prizes, free goodies, free data science books

Free food & drinks, loads of snacks…

Fast fibre optic broadband & 12 HS Wi-FI routers

Chill out, sleeping, dining areas. Showers for M&W


Twitter hashtag #bdhldn



Day 1 - Saturday April 13

10:00am Doors Open & Welcome
10:30am Morning Talks Start
12:30pm Buffet brunch served
1:00pm Data Challenges Start
2:00-6:00pm Afternoon talks & workshops
7pm Dinner served
Data Hacking and Data Challenges continue through the night

Day 2 - Sunday April 14

8:00am Breakfast served
1:00pm Data Challenges End
2:00pm Winners Announced, Prizes announced by DSL & UKWUG



***MORNING TALKS & WORKSHOPS 10:30am to 12.30pm ***

These talks will provide  an overview of open source data science tools and free datasets that will be available to all participants on Windows Azure

 1. Big workloads on Windows Azure (Richard Conway, Andy Cross, @Elastacloud)

Setup a Windows Azure subscription, use VM images and your free time to with an Extra Large VM instance to do more work than your laptop can handle. A first look at HDInsight on Azure as well, the new Microsoft Cloud-based Hadoop distribution.

3. How to deploy the IPython Notebook on Microsoft Azure, using Linux or Windows Virtual Machines (Wenming Ye, Microsoft)

In this tutorial we will demonstrate how you can create your own IPython notebook with scikit-learn on Windows Azure and start being productive immediately.


1. Data Science Programmability Made Scalable and Simple (Don Syme, @Microsoft Research)

Productive Data Science is powered by productive Data Programmability”.  In this talk, we’ll examine a key recent breakthrough in data programmability with both big data (many rows) and broad data (big schemas).  The modern web and enterprise is massively information rich, but programming languages have traditionally been information sparse. The innovative F# language from Microsoft is leading the way in this area. We’ll look at how F# 3.0 Information Rich Programming allows strongly-tooled programming languages to scale to internet-scale information sources like Google Knowledge Graph, web data markets, databases, Big Data systems, services and enterprise data schemas.

This talk will be of interest to everyone who works with big and broad data schemas. Even if you don’t program, come along and see how your data scientists can be empowered to compose and deploy analytical components to critical production scenarios.

2. Machine Learning: From Kinect to Controlling HIV (Dr. Kenji Takedi, @Microsoft Research)

Machine learning is transforming the ability of computers to contribute to society, from the family and living room to global challenges such as climate change and healthcare. The diverse range of machine learning techniques and applications is having a profound effect on the way we use computing in daily life, with an enormous future potential as the volume of data produced by people, devices and services increases exponentially.  In this talk we discuss how Microsoft Research is applying machine learning in such diverse areas as body tracking for Xbox Kinect, and using spam detection techniques in the quest for an HIV vaccine.

3. Apache Hive and Stinger (Chris Harris, EMEA Solution Engineer @Hortonworks)

Apache Hive is Hadoop’s SQL-like interface, used for reporting and analysis over huge volumes of data. Hive was released by Facebook in 2009 and is now used there to run more than 60,000 queries per day over more than 100 petabytes of data. Hundreds of companies use Hive in production for its reliable data processing and unmatched scale. Community activity in Hive is greater than ever before and 2013 is full of exciting new developments for Hive in both performance and analytics capabilities. Come to this session to:

* Learn about how “Project Stinger” will achieve its goal to make Hive 100x faster than it has been in the past, enabling both more scalable analytics and human-time query
* Learn about Hive’s new analytical capabilities, windowing functions and standard SQL datatype

4. Two Timeless Techniques for Exploratory Data Analysis (Jonny Edwards, Data Scientists @ Thoughtful Technology)

Back in the old-days unsupervised learning was called “Exploratory Data Analysis” and consisted of a collection of techniques for visualising and summarising data. Two methods have stuck with me in my work, as they are simple to apply and produce intuitive results, these are K-means Clustering and Multi-Dimensional Scaling. Thankfully both R and Python Scikit-learn support these approaches, and in this talk I will run through their application, and interpretation, on a non-trivial data-set. In addition to this, I will compare the results to some of the newer methods that perform similar tasks, and show you that the old approaches still have a few advantages.


Datasets for Data Science and Data Visualzation Challenges provided by:



Datasets for Free Style Challenge: ten datasets from several sources

All the Datasets and many open source tools will be provided to all the participants in a free Windows Azure instance








Data Science Challenge: Predict human judgements about who is more influential on social media. Please see section: “Prizes, Data Science Challenge”

The dataset will be provided by Peerindex as a standard pairwise preference learning task. Each datapoint describes two individuals. Pre-computed, standardised features based on twitter activity (such as volume of interactions, number of followers, etc) will be provided for each individual. The discrete label represents a human judgement about which one of the two individuals is more influential. The goal of the challenge is to train a machine learning model which, for a pair of individuals, predicts the human judgement on who is more influential with high accuracy. Labels for the dataset have been collected by PeerIndex using an application similar to the one described in this post. This competition challenge will be hosted and run as a Kaggle competition.

Data Visualization Challenge: The challenge is to develop an interesting data visualization based on data provided by Peerindex with rich social influence data about the 140 most influential people in UK. Please see section: “Prizes, Data Visualization Challenge”

PeerIndex will provide basic statistics about social interactions, activity, audience and authority, data about topics the individuals are influential in, and influence graph data representing the network of influence between them. Participants can decide to concentrate on any subset of the data provided to create an insightful and engaging visualisation. The best visualisation judged by a panel will win and will be considered for publication by one of the largest British news publishers.

Free Style Data Challenge: You’ll be provided with several datasets loaded in a Windows Azure instance. You are free to do or develop anything you’d like with those datasets. Please see section: “Prizes, Free Data Style Challenge”

Challenge Links:

Data challenge rules and information:

Data visualisation dataset:

Amazon movie reviews dataset:

Beer reviews dataset:

Bitcoin dataset:

Friendster datasets:

Gowalla dataset:

Last FM dataset:

Malicious URLs dataset:

Social memes dataset:

Twitter census dataset:

Abu Dhabi Building Dataset:



Data Science Challenge - Prizes per team

First Prize £800
Second Prize £400
Third Prize £200

Free Style Data Challenge and Raffles: In addition to the money prizes above, we will give away several product prizes, including: Nokia Win8 phones, XBox360, XLive subscriptions, and an AR Drone.

The product prizes will be awarded to individuals or teams who win the Free Style Data Challenge and/or raffles or similar to be done throughout the hackathon.

Data Visualisation Challenge -Prizes per team

First Prize £250 and publication of dataviz in main UK online newspaper
Second Prize £150


kaggle_logo Nokia logo


Assemble a team with your colleagues

Compete. Learn. Share. Make New Friends. Have Fun!

Date: April 13 & 14

Location: Hub Westminster,
1st floor New Zealand House,
 80 Haymarket,

Venue Opens: PLEASE ARRIVE NO LATER THAN 10am to the venue. The venue opens at 8.30 am on Saturday April 13

Upon arrival to the venue you will have to check-in at the reception desk.

The reception desk should have your name in the list of participants.

If you have not registered in Evenbrite you will not be in the list.

Please make sure you register in Evenbrite before you arrive to the venue

Data Challenges start: Saturday April 13 at 1pm London time

Data Challenges end: Sunday Saturday April 14 at 1pm London time

About the Venue and Survival Logistics:

12,000 sq ft of open, airy space

High-speed 100Mbps Internet, 12 super HS Wi-Fi routers

Great furniture, ergonomic chairs, and comfy sofas.

Shower facilities for men and women

Chill out, resting and sleeping areas available

Please bring your own laptop and any tools you want.

You can stay overnight at the venue. Please bring your own sleeping mattress and sleeping bag. There will be a quiet, chill out area

You will be provided with free food and drinks (lunch/dinner/breakfast), unlimited drinks, coffee & snacks of all kinds.

Venue Location Map:

View Data Science London in a larger map






Data Science London

We are a non-profit organization dedicated to the free, open, dissemination of data science.
Largest data science community in Europe.
1,667 data scientists & data geeks

  • Free dissemination of Data Science
  • Free debate, discussion, and forum of ideas
  • We like to meet new people, share ideas & PoVs
  • Building a community of Data Scientists
  • We are technology agnostic, independent
  • We promote Open Source and Open Data

Follow us @ds_ldn website

UK Windows Azure Users Group

  • Non-profit organisation. Est. Oct 2011
  • Funded by founders and sponsors
  • Consistently received strong support from Microsoft UK
  • Largest Azure UG membership in Europe
  • Monthly meetings in Manchester & London
  • Founded by Microsoft MVPs Richard Conway & Andy Cross
  • Popular monthly turnouts of up to 110
  • Largest Azure conference in Europe
  • Spring 2012 release w/ Scott Guthrie, 700 attendees

Follow us @ukwaug or visit us on the web at