Aleksey Nozdryn-Plotnicki

blog, portfolio, etc.

Ontario Sunshine List Open Scraper

Today I am announcing my open Ontario Sunshine List Scraper released under the MIT License.

You can download the data directly here. I will likely move this later.

Anyone can go here to the GitHub repo and download python code that scrapes the Ontario Public Sector Salary Disclosure data into a machine-readable format.

Today’s version of the code has two key limitations:

  • Only the initial disclosure is scraped. Addenda are not scraped or processed.
  • 2015 disclosure has not yet been published or included

Please feel free to fork the repo and build ahead.

Anyone can create their own CSV like this:

import ontario_sunshine_list as osl

col = osl.Collector()'/home/aleksey/data/sunshine/')

scr = osl.Scraper()
df ='/home/aleksey/data/sunshine/')

cle = osl.Cleaner()
df =

df.to_csv('/home/aleksey/data.csv', encoding='utf-8')

"Charter" Schools Have Worse Immunization Rates in California

In California, if your Kindergarten has the word “Charter” in its name, that says a lot about its measles vaccination rate.

  • “Charter” kindergartens are almost twice as likely to have a measles vaccination rate below 92-94% for herd immunity.
  • “Charter” kindergartens have an average measles immunization rate of 83.8% compared to 91.8% in other schools.
  • Personal Beliefs Exemptions were 10.8% in “Charter” schools compared to 3% in all others.

Credit: Pattern first identified via visual inspection of Vaccination Rates for Every Kindergarten in California from the New York Times.

Data: Available on the California Health and Human Services Open Data Portal. Specifically the School Immunizations in Kindergarten 2014-15 dataset, here

Regressing My Gas Log

What mileage does your vehicle get? Well that depends. City or highway? Well what is your fuel efficiency in each? Today I geeked it up and did a regression on my gas log to determine just this. Yes, I keep a gas log. It pleases my inner scientist and it helps me watch for sudden changes in fuel efficiency.

Raw data, Fill Ups: kms | litress | efficiency | est_mix_city | est_mix_highway — | — | — | — | — 420.8 | 46.514 | 11.05370722 | 0.5 | 0.5 576.4 | 59.53 | 10.32789729 | 0 | 1 505.2 | 51.602 | 10.2141726 | 0 | 1 522 | 54.273 | 10.39712644 | 0 | 1 305.3 | 32.433 | 10.62332132 | 0 | 1 1111.4 | 108.578 | 10.73541625 | 0.2 | 0.8 508.1 | 53.112 | 10.45306042 | 0.1 | 0.9 496.4 | 52.4393 | 10.56392023 | 0.1 | 0.9 442.1 | 44.5392 | 10.07446279 | 0 | 1 393.4 | 43.239 | 10.9911032 | 0.1 | 0.9 429.2 | 45.345 | 10.56500466 | 0 | 1 476.7 | 58.217 | 12.21250262 | 0.85 | 0.15

In google docs it is a simple thing to run =SLOPE(efficiency rows,est_mix_city rows) and =INTERCEPT(efficiency rows,est_mix_city rows) giving me 10.381 and 1.967.

Simple interpretation is then: - With a city mix of 0 and therefore 100% highway, I would get 10.381 + 1.9670 = 10.381 L/100 km. - With a city mis of 1 and therefore 100% city, I would get 10.381 + 1.9671 = 12.348 L/100 km.

Before you make any snide comments, I drive a two-ton 20-year-old 4x4 diesel van. You’ll notice mostly not in the city. It’s good at some things, not so good at others.

Yes, there are probably better techniques to get an estimate that perhaps weight bigger fill ups higher, or recognize the small number of data points. But whatever, this was a great way to get to a first estimate.

(Aside: It’s interesting that I am happy to use the term “mileage” to describe a statistics that I will measure in L/100km)


Visualizing Confirmed Exoplanets

I’ve just published a post over at the NGRAIN blog, Visualize confirmed exoplanets

I used NGRAIN’s Constructor SDK to put together a simple visualization of the distribution of confirmed exoplanets relative to Earth.

By no means a feature-rich interactive visualization, it’s simple, powerful, and interesting.

Gephi Layout Plugin: Random 3D Layout

bigRandomCubeYesterday, I released a simple plugin, Random 3D Layout to the  Gephi Marketplace here. It’s very simple, and I developed it as part of my work at NGRAIN.

For anyone working in 3D in Gephi, this will be a useful and simple initialization layout plugin. Works well with the Force Atlas 3D Plugin. Without it and using a 2D layout to initialize instead, the 3D results can sometimes be milky-way-shaped, not generally spherical as you would expect, and ultimately not taking full advantage of the third dimension.

Augmented Reality Demands 3D Data Visualization

I’ve just written a post over at the NGRAIN blog, Augmented Reality demands 3D Data Visualization

As Augmented Reality (AR) continues to gain strong momentum, it is clear that data and information now have a physical 3D context, both conceptually and at the point of display. In the past, we were content with excellent 2-dimensional solutions for 2-dimensional displays, free to ignore 3D data viz as fraught with peril and difficult to execute. At a minimum, we must now wrestle with the challenge of presenting 2D visualizations in a 3D world. And, if we limit ourselves to that, we will fail.

On Looking Beyond Two Dimensions

I’ve just written a post over at the NGRAIN blog, On looking beyond two dimensions.

If you don’t look at your data it can deceive you. That was true in two dimensions and is true above. If we live in a world of bivariate visualization we will miss important complex patterns because we are not looking for them. The third dimension must be considered when using visualization in analysis just as it must be rejected as a means to jazz up uninspiring pie charts.

Data Visualization Lead at NGRAIN

It’s worth announcing that last month I took a position as Data Visualization Lead at NGRAIN in Vancouver, Canada.

NGRAIN’s vision is to see beyond reality, and to help people accelerate decisions by interacting with the world’s data in 3D. To make that concrete, we are currently developing cutting edge Augmented Reality applications for the industrial enterprise. 3D data visualization is notoriously difficult to execute, but in Augmented Reality we will no longer have a choice but to confront it, as our reality is 3D. Bringing my expertise to this problem will be an exciting challenge and will require great care.

What you see here will continue to be my own views and not those of NGRAIN. I will, however, likely be publishing on NGRAIN’s blog and cross-promoting here.

My Ontario Sunshine List Work in the CBC News

Just some shameless self-promotion. You can find my work cited in the CBC News from today.

Kazi Stastna writes Sunshine List 2014: Ontario’s list drives salaries up, not down and includes:

Ontario pathologists did just that and saw their Sunshine List salaries increase by 20 to 25 per cent between 2011 and 2012, compared with the 2.2 per cent average for the list as a whole, according to an [analysis]( done by data blogger Aleksey Nozdryn-Plotnicki. … Nozdryn-Plotnicki [found]( that this inflationary effect is greatest at the upper echelons of the Sunshine List, with the salaries of the 1,000 highest-paid workers rising 7.2 per cent between 2011 and 2012, compared with 2.2 per cent for the bottom half of the list.

They are, of course, referring to my work:

For those outside of Canada, the CBC, the Canadian Broadcasting Corporation is a public entity and major player in the Canadian news media market. It’s my personal number one source for Canadian news.