A number of questions have come up recently about how to use the Socrata API with Python, an awesome programming language frequently used for data analysis. It also is the language of choice for a couple of libraries I’ve been meaning to check out - Pandas and Bokeh.
No, not the endangered species that has bamboo-munched its way into our hearts and the Japanese lens blur that makes portraits so beautiful, the Python Data Analysis Library and the Bokeh visualization tool. Together, they represent an powerful set of tools that make it easy to retrieve, analyze, and visualize open data.
First, a caveat. I'm not a Python developer by trade, and it's not unlikely I'll mess something up along the way. If you find something I've done wrong, feel free to file an issue with a suggestion or submit a pull request and we'll be glad to accept it.
This script is based on the excellent unemployment data example provided with Bokeh.
First, you’ll need to install a few Python packages with pip
:
pip install pandas
pip install bokeh
We’re going to be analyzing this dataset of Los Angeles Police Department Calls for Service. For fun, let’s analyze the dataset and see what days of the week the most noise disturbance calls for “Parties” are on, and see if we can identify some popular holidays.
First, we’ll structure a query that uses SoQL to aggregate the dataset so that we don’t need to pull down all of the details of the millions of calls the LAPD has received. We’ll do a few things:
call_type_code
“507P”, the code for a noise violation call on a partydispatch_date
field and get the count of noise violations per day https://data.lacity.org/resource/mgue-vbsx.json?
$group=date
&call_type_code=507P
&$select=date_trunc_ymd(dispatch_date) AS date, count(*)
&$order=date
Note: Unfortunately the LAPD has since taken this dataset down, but we've left the query here as an example.
Pandas makes it super easy to read data from a JSON API, so we can just read our data directly using the read_json
function:
Next we augment our data with a day_of_week
index so we can then create a pivot table to build up a grid of our weekly data. We’ll also create our weeks
and days
range and domain arrays:
Now we’ll format the data for Bokeh, creating parallel arrays of our axis values and our data values. We’ll also create an array of color values based the number of parties for that day:
Finally, we’ll pass everything to the Bokeh rect
plot, which will create our visualization. We’ll provide it our source data, X and Y ranges, and some plot configuration details:
When you’re done, you simply run the script with Python, and it generates and opens up the visualization in your web browser. The results are pretty cool. You can clearly see that the weekends are the busiest nights for the party patrol, but you can also spot popular holidays like New Years Eve, Independence Day, and Labor Day. You can find the full source code in this Gist