Current publishing of Ontario "Sunshine List" not good enough

Standing where we are in 2013, the Public Services Salary Disclosure Act of 1996 in Ontario seems ahead of it's time in terms of open government data. 17 consecutive lists published of all public sector employees who earned more than $100,000 in a year. But, what was once a bold step forward in terms of public accountability, is now falling behind in other ways.

By today's standards, publishing an intimidatingly long list of approaching 100,000 names and salaries across 100 or so HTML or PDF pages does not constitute disclosure. Sure, it's great if you want to look up how much your boss makes or to keep an eye on the salaries of TVO presenters, but after that it falters. Making data possible to access, and making it easy to access are different things. If the data were published in print, but not made available online, would that be acceptable? Was it in 1996?

Any data journalist who wants to work with the data, must first scrape it from those hundred pages, which either requires some technical skill and some time, or brute force and quite a lot of time. Even answering simple questions like, “How many names are on the list?” and “What is the average salary?” have to wait for this scraping to be performed.

What about the general public? Even if they've never heard of scraping a web page, they should still have natural questions like: How many people from each employer is on the list? How much money are CEOs making on average this year? Is that more than last year? How many people on the list are Pathologists?

Easy change

At a minimum, the entire list should be made available for download in a single file in CSV and/or XLS format. This would remove barriers and save time for any data journalist wanting to access the information. This should be trivially easy to do, because by the looks of the URLs, the 2013 (for 2012) disclosure is already stored in a database.

Empower the ecosystem

Not only does this mean that citizens and journalists could better access the data, but it also would enable data visualisation and interaction practitioners to create tools for the entire public to access the information.