TFL Bike Data in Python
Written by Jono Hawkins on 27 Jan 2021
This is a work in progress!
Back as an undergrad, we were set an assignment for out Introduction to Programming course to develop a UI in Python that displayed some open access data in a ‘user-friendly’ way. I had recently come across the TfL Open Data API and used the Barclay Bicycle hire data to create an animation of where bikes were moving across London over a period of a few days.
The results were pretty trivial because the data only provided start and end points for each journey, so the map showed a series of lines that appeared and disappeared as a bicycle was “on the move”. At the time I thought it would be nice to extend this to show the actual route of the bicycle but never got round to it.
Recently, I’ve returned to the problem in a bid to re-familiarise myself with Python and matplotlib to prepare for the eventuality that I lose my MATLAB license and still need to knock up scripts for data analysis.
Source code for the project can be found under the TFLBikeViewer repository on my GitHub.
The data for this project can be found here
in the form of CSV files from the TFL Cycling Data repository. There is a range
of data available here, but I’m interested in the
usage-stats datasets. These
can be downloaded manually or a listing (
cycling-load.json) is provided if you
wanted to automate multiple downloads. The contents of the CSV files are in the
Rental Id,Duration,Bike Id,End Date,EndStation Id,EndStation Name,Start Date,StartStation Id,StartStation Name 94113398,960,6800,07/01/2020 14:07,541,"Green Park Station, Mayfair",07/01/2020 13:51,164,"Cleveland Gardens, Bayswater" 94117049,600,8691,07/01/2020 17:06,48,"Godliman Street, St. Paul's",07/01/2020 16:56,323,"Clifton Street, Shoreditch" 94110497,540,531,07/01/2020 11:01,654,"Ashmole Estate, Oval",07/01/2020 10:52,624,"Courland Grove, Wandsworth Road" ... etc ...
So no change there - data is still only provided for the start and end
locations of each bike journey, as well as timing information and some
identifying data of each bike. Rather than use the name data provided, the
BikePointStore class is used to query the BikePoint
TFL API to find an unambiguous latitude, longitude position for each
bike rental point. This stores the positional and name data for each bike point
in a Python dictionary which can be queried by the
id field provided in the
It’s worth noting here that - for some reason - TFL prefixes these IDs
with the string
BikePoint_ so that needs to be stripped from the data provided
by the BikePoint API…
Picking and Choosing
JourneyStore class is used to store the interpreted CSV files in memory.
It can be instantiated as
journeyStoreObject = JourneyStore('/path/to/file.csv')
and then queried using the
filterOngoing methods, which
return lists of
Journey objects. These lists are filtered based on the start
and end date of each journey - to create the moving dots GIFs results, I used
filterOngoing method. The boolean conditions for this filter isn’t
perfect and needs some improvement to work well for longer durations between the
The Journey … API
The TfL Journey API is
used to find the route between each bike point. The required inputs to the API
to locations, however to limit this to routes the
parameter is used to specify cycling as the mode of transport. There was a
strange bug where some point to point journeys returned a
NullReferenceException, but this seemed to be resolved by passing the
cyclePreference parameter with the value
Requests are made from the
RouteStore class using the
method. At present, the first (0th) journey is always chosen to be
saved, however there is potential to expand for random journeys if multiple
options exist between the start and end point (i.e. based on different travel
times). The latitude and longitude from each
step for each
leg are then
appended to a list to store the stop locations for each journey.
# Import TFLBikeData library import tflbikedata # Load any available route data from the local file routeStore = tflbikedata.RouteStore('/path/to/routestorefile.pkl') # Display how many routes have been loaded from the file print('Loaded', rStore.numRoutes,'routes') # Get the route data as two equidimensional arrays containing the latitude and # longitudinal data startIdx = 419 # example start ID endIdx = 677 # example end ID lat, lon = routeStore.getRoute(startIdx, endIdx) # If the route from startIdx to endIdx hasn't been fetched before then save it # locally to prevent future HTTP requests (uses Pickle internally). routeStore.save()
The time taken for each request to the API was in the order of 1-2 seconds
which makes it infeasible to perform these on-the-fly. The
is used to store these locations into a local file which can be used to speed
up the plotting process later on. At the moment this is fairly trivial, relying
on a multilayered Python
dict, where indices are the bike point IDs and values
are tuples containing latitude and longitude pairs. This is saved using the
Pickle library to a local file.
Where am I?
In order to map where each bike is at a given time, the total distance of each
Journey is computed when it is loaded from the
JourneyStore. This is a
proxy measure because it is computed directly on the difference in the
longitude and latitude coordinates but because the maximum distance between
stops is small, we can apply the small angle approximation.
It is necessary to make the assumption that the cyclist is travelling at a
constant speed from the duration of their journey because of lack of available
data. To find the location of the bike at a datetime
time, the ‘time after
‘leaving’ is computed and converted to a fractional “duration” by dividing
through saved journey duration.
# journey is a Journey object, .startDate is a datetime, time is a datetime timeAfterLeaving = time - journey.startDate # Convert to a fractional duration fractDuration = timeAfterLeaving / journey.duration
Then, iterating over the legs of the journey made up by consecutive lat-long
pairs, the fractional distance cumulative distance at that leg with respect to
the total distance of the trip can be calculated. If this exceeds the
fractional duration calculated earlier, then we must be somewhere in this leg
(because we use the cumulative distance including the current leg). If that is
the case, we find the fractional distance along the current leg (or
subFractDistance) and use this to project out lat-long position.
# Iterate over individual legs idx = 0 prevDistance = 0 while idx < len(self.distance): # Check whether we are in the current leg cFractDistance = (prevDistance+self.distance[idx])/self.totalDistance if cFractDistance >= fractDuration: if self.distance[idx] == 0: subFractDistance = 0; else: subFractDistance = ( fractDuration * self.totalDistance - prevDistance) / (self.distance[idx]) # Calculate offset lat = self.lat[idx] + (self.lat[idx+1] - self.lat[idx]) \ * subFractDistance lon = self.lon[idx] + (self.lon[idx+1] - self.lon[idx]) \ * subFractDistance return lat, lon
Putting it Together
All of this comes together (with a bit of additional matplotlib and shapefile
trickery found in BackgroundMapPlotter) in the file
to produce time series of TfL cycle hire movements over time. Specifying a
endDate and interval - as well as loading the correct data file -
allows for the generation of the animated GIFs seen throughout this article.
There’s still quite a lot of work I’d like to do on analysing the bike data
itself and hopefully the
RouteStore object can add a layer of depth to this
analysis. For example, comparing the frequency of journey durations and their
(approximate, TfL suggested) route length, might be used to indicate which bike
points are hotspots for ‘sightseeing’ trips, where the journey duration is
significantly longer than the ‘approximate’ travelled distance. Even analysis
of the raw data itself should prove interesting in looking at seasonal
variability in bike station use and identifying public events etc.
Powered by TfL Open Data.Contains OS data © Crown copyright and database rights 2016 and Geomni UK Map data © and database rights . Shapefile data was provided by Geofunk.