Open data

This is my #ioe12 open data post. This topic’s featured video from Tim Berners-Lee created a nice bridge between hypertext and open data.

The role of standards

HTML and HTTP were developed with the idea that there was a need to create a standard communication platform to share documents, a way for different computers and devices to display and link documents in a consistent way. The opposite of standards is kinda like having to install drivers or proprietary readers on every different machine.

Berners-Lee describes a future where data would get the same treatment. Besides simply residing online, data should be accompanied by its relationship or meaning, so that the user could drill down (or sideways) into databases.

… but people have to adopt it!

People have to make the same paradigm shift that happened when hypertext was introduced. People would not put their documents online, and understood poorly why it was even relevant.

Now, the challenge is to have them put their data online as well, and have them make the same mentality shift.

Raw data now!

To most people, data is very boring. I like Berners-Lee’s metaphor of boring brown boxes that can lead to beautiful content like Hans Rosling’s TED talk.

People don’t realize their data has value, because they have not figured out ways to make it relevant to them, in their disciplinary context. But once others have access to well-formatted data for their projects, they can come up with pretty amazing stuff. I think the easiest example is geo-data. Let’s say you wanted to compare high school senior’s SAT scores with socio-economic status and location. You could pull that data in, compare, and color-chart a map, exposing scholarly at-risk neighborhoods in your state. The value of data is not fully realized until it’s linked.

Citizens have to ask for data to be released, they can’t just hope that others will release it. Asking has two effects: It indicates to the owners of the data that there is a need for it, and it makes them examine the appropriate steps to potentially release it.

Many agencies claim that they want to build the perfect website to showcase the data. Although this is a noble effort, data can be valuable without a nice graphic user interface, so let’s release the data first and come up with the web part afterwards.

Protected data means lost potential

Protected data doesn’t have the same effect of the global potential of linked data. From the barriers’ list on the Open Data Wikipedia page, I think the one I’m most concerned with is the “restriction of robots to websites” one. Robots, or bots, are the way information is indexed. If bots are not allowed inside a certain database, the “findability” of the data inside of it will significantly decrease. 

Raw data in itself is very hard to understand for humans. It’s mostly columns and rows of text and numbers, table after table, layer after layer. Computers are the best way to crunch those data into something humans can understand, so restricting access to the bots that are supposed to create the potential relationships of data is counter-productive.

Open data licensing

To remove barriers, it seems the Open Data Commons project suggest two main licences for databases: Public Domain (PDDL) and Open Database (ODbL). This seems like a decent approach that moves away from the compatibility problems the Creative Commons licensing scheme has.
The ODbL license is similar to the GNU one, some sort of copyleft where changes to the data need to be shared back.

One thought on “Open data

  1. Pingback: Claiming my novice badge for #ioe12 | Open Reflections

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s