Ontario Sunshine List Open Scraper
2015-03-07

Today I am announcing my open Ontario Sunshine List Scraper released under the MIT License.

You can download the data directly here. I will likely move this later.

Anyone can go here to the GitHub repo and download python code that scrapes the Ontario Public Sector Salary Disclosure data into a machine-readable format.

Today's version of the code has two key limitations:

  • Only the initial disclosure is scraped. Addenda are not scraped or processed.
  • 2015 disclosure has not yet been published or included

Please feel free to fork the repo and build ahead.

Anyone can create their own CSV like this:

import ontario_sunshine_list as osl

col = osl.Collector()
col.run('/home/aleksey/data/sunshine/')

scr = osl.Scraper()
df = scr.run('/home/aleksey/data/sunshine/')

cle = osl.Cleaner()
df = cle.run(df)

df.to_csv('/home/aleksey/data.csv', encoding='utf-8')