I'm reminded of this recent post by James Fee, talking about geodata, but I think it applies to the general case:
"... The question was APIs or downloads...
Personally, I believe [data] is one of the best
ways for citizens to keep track of their government
(local to federal) ... APIs tend to deliver what
their “owners” want them to do. Raw data means
everyone has an opportunity to check each other’s
work. Of course, raw data can be manipulated as
well, but it is harder to obscure."
I couldn't agree more. APIs are great, but are not the key to open government, for two reasons:
1. They don't provide simple and easy access for non technical individuals into raw information.
APIs shouldn't exist for querying historical datasets if the dataset is not already available in a static format. Release the data, then build an API if there is demand (or the private sector doesn't do it, better, for you).
2. Historical data access is poorly served by APIs.
There is no such thing as a good 'general use' API[1]. API's are appropriate for specific service based transactions that involve some level of processing. Examples:
3. Bonus reason: government agencies suck at building APIs.
They're not good at determining what is genuinely high value to end users, they tend to prefer visible projects that can justify budget increases, over genuinely useful, but less easily communicated ones (cf. the US national highway system and pork barrel politics), and there is an entire industry of enterprise companies heavily invested in keeping it this way.
TL;DR Release the data, let users build the APIs. Everyone wins.
Bootnotes:
[1] I lie. That's exactly what publishing raw data at stable URLS on a website achieves.
The exciting thing about APIs isn't what they directly do for non-technical individuals.
Instead, a good API makes government into a platform for free (and paid) services to be built to deliver that data in innovative ways. Examples of this are starting to appear in places like Chicago which has opened up a lot of data access – for things including transit (bus tracking, etc.) and a lot more. Giving hackers platforms to innovate will definitely yield better results than just throwing gobs of data at the general public. (Never mind that not all raw data is created equal or that raw data also requires savvy people to distill.
It also means that it's potentially going to be easier for one unit of government to interact with (or at least query) another. That may be big as well.
I think this is more about changing the mindset of the typical workers at the federal agencies (hundreds of thousands of whom are not doing IT related tasks as their primary function). Those of us who work for these agencies as programmers & IT people have a big job to do.
Only recently have the govies started thinking about how their data can be useful to the general public. In the past everything had been stovepiped and guarded with peoples' live(lihood)s. Hopefully this makes them think about data integrity throughout the life of that data.
Think about how data used to be provided to the public before. A bunch of government folks had to collect data and make sense of it themselves and put it together in a report destined for congress. It's waaaaay different to just provide that raw data to the public. Rather, I think what we'll see is more sanitized data sets, after they've been internally analyzed and vetted (probably multiple times). Not exactly transparent.
But I hope one day, after many iterations of API building, we'll get to a point where the data truly is transparent.
As is obvious to most on HN, requiring an API (as opposed to a CSV file release schedule, etc.) is fairly meaningless, and most definitely not a presidential-caliber dictate. Some agencies' data might be far better suited to publication in a CSV and posted on a web page, for example.
If a president could have a meaningful impact on this sort of thing, it would be in setting a high bar for the quality of information released by agencies. Any sort of requirement of this kind is completely absent from the announcement.
So rather than being about transparency as it's being touted, the announcement is a celebration of high tech obfuscation. Soon the same sort of insulting, opaque, useless information spouted by officials in press conferences will be available via HTTP. This is at best a neutral day for democracy.
Seems like the executive mandate is an "80% now" solution instead of waiting for perfection. At least they're thinking about the dissemination of information.
Technically, having a bunch of HTTP resources that return data of type text/csv in response to GET requests is a perfectly valid API. It's also easy to build: just put those CSV files in a directory and have Apache serve it up.
How useful this is depends on how clear the data is, how well they document things, how sane their document formats are -- in other words, it depends on things that are much harder to mandate than just "have an API". I'll predict in advance that most of the APIs here will be pretty half-assed.
I disagree. Certainly some agencies may have data that ought to be examined holistically via CSV (at least for researchers?), but having data available via API is better than the status quo.
While this might not set a "high bar for the quality of information," the President's effort shows a level of commitment to both technological streamlining of government agencies and to transparency and is, at the very least, a step in the right direction.
I'm not disagreeing with your point, but I wonder if you actually read the memo from the CIO, or are just speaking ex cathedra about the requirements being given?
>As is obvious to most on HN, requiring an API (as opposed to a CSV file release schedule, etc.) is fairly meaningless, and most definitely not a presidential-caliber dictate. Some agencies' data might be far better suited to publication in a CSV and posted on a web page, for example.
And for those cases, the agency can still justify why the CSV is better. There are always ways to get around the rules, see Section 508 rules for handicapped users.
>If a president could have a meaningful impact on this sort of thing, it would be in setting a high bar for the quality of information released by agencies. Any sort of requirement of this kind is completely absent from the announcement.
This is a step in the right direction, once the data is more accessible, the "users" (developers) can request for better data. The /DigitalStrategy page requirement is really good in my opinion and will make things simpler instead of the mishmash of sites buried in menus and behind authentication walls.
>So rather than being about transparency as it's being touted, the announcement is a celebration of high tech obfuscation. Soon the same sort of insulting, opaque, useless information spouted by officials in press conferences will be available via HTTP. This is at best a neutral day for democracy.
Whoever said this was about transparency in government? It's not about transparency per se.This is more about making information easier to access and find. In many cases agencies already have APIs, Webservices, data dumps, but they're really buried. How is making them more visible neutral?
I think this should be judged against the status quo as a positive development rather than against an abstract ideal as a flawed concept. Having seen too many clients stuck in analysis paralysis or blocked by political/turf issues while trying to develop corporate-wide standards (protocols, object models, etc), I'm just happy to see online access to public/government data advance in any way.
If we had to wait for higher-level, coordinating standards first, progress might never come.
Meanwhile, just yesterday the House Committee on Appropriations voted to [indefinitely delay][1] making legislative data available in machine-readable (XML) format. It's a repeat of a move taken in 2008 to "make a plan to make a plan" that never really goes anywhere. In other words, it's not gonna happen for a long time yet.
Don't expect someone to do something that goes against their self-interest. Especially if they can do a song and dance and make people forget that nothing is getting done.
"...Within 90 days of the date of this memorandum, create a page on its website, located at www.[agency].gov/digitalstrategy, to publicly report progress in meeting the requirements of the Strategy in a machine-readable format....
...implement the requirements of the Strategy within 12 months of the date of this memorandum and comply with the timeframes for specific actions specified therein"
3 months to get a "machine-readable" status report on implementing an API?
Then, complete the implementation in 12 months?
If it takes 3 months for an agency to get a status report up, how long will it take them to implement said API? Government work, sheesh....
3 months to get a "machine-readable" status report on implementing an API?
2 weeks for the director of each agency to delegate someone to be responsible for this. 2 weeks for said responsible person to figure out what an API is. 6 weeks for them to go around to everybody in the agency asking "are you doing any APIs yet?". 3 weeks to take the feedback and turn it into a semi-coherent report.
They need to be sure they 1) are exposing everything they need to be exposing, 2) are hiding everything they need to be hiding, 3) are ready to handle potentially significant load, and 4) are robust against attack. Each of these is a little harder and/or a little more important in government than in a start-up (your start-up is probably less likely to get an informant killed because they posted the wrong thing than, say, DOJ), and 1 and 2 especially involve more than technical work - processes need to be set up to actually get the data where it needs to go in the first place. I'd love it if it could be faster, but I don't think 15 months is at all absurd.
They'll simply report they haven't been able to implement the report, someone will get a slightly lower evaluation score and life.gov will go on as usual.
My best guess is that in the initial 3 months they wont be doing just the page, they will be outlining the efforts in order to comply in the next 12 months with the executive order, buy any hardware they must buy, and delegate any work to any branches they might need to
I'd like to see you hire an entire department's worth of people, wait for them to devise an API for your agency, code it, provision server space and servers, and deploy in 12 months!
What will this bring? Well, the US govt has X agencies. The result of this decree will be that, within 12 months, all of the public will get to enjoy the thrills of having X incompatible web API's, one unique one per agency.
This reminds me of the push here in NYC for all of the city agencies to open their data via an API. It's gotten better over time, but when the initiative first took flight, it was terrible. Some of the APIs flat out did not work, and the ones that did often returned all sorts of malformed, non-normalized data. It was a nightmare to work with. I'm curious if the government can do better.
This is kind of interesting because maybe it shows how far the thinking is from technology right now. I can't wait to integrate FBI files into my web app, and maybe I can bypass 'authorized e-file providers' to file my taxes. Maybe I can download daily spy satellite imagery. My point is that what is already meant to be available is probably already available.
Decision makers are often excited about technology but don't really get the ground level experience. They want to do all the things...on a roadmap...with milestones. Mobile has to be involved in some way.
This will likely go down the same way the original IPV6 mandate went down, before it was postponed, and before it likely will be postponed again when nobody's met the mandate.
The issue is far more complicated than the comments I see in here are giving credit for. Don't get me wrong, there's going to be delay as the PHBs get themselves wrapped around what an API even is, but they'll have the directive routed to their CIOs before that, and they will understand the requirement, and how impossible it is.
The biggest issue is that the data isn't really owned by the government entities. I mean, the data is theirs, but it's locked up in their vendor provided tools, and/or their custom, built-by-vendor products. If they're using Oracle AquaLogic (or whatever it is now) to host the majority of their portal content, they're dependent on Oracle to either come in and show them how to implement the feature (which is a significant service dollar cost) or they're going to have to wait until Oracle builds the ability for API exposure into the product if it doesn't exist yet.
If they've got custom-built portals, they'll need to consult with the vendors who wrote them or maintain them now and get them to add that in. That means that they'll have to modify the contract originally bid for the project, which is going to eat up a couple months of the timeline alone. Then they'll have to figure out what sort of things actually make it into the API, how to segment sensitive data reliably, get it through ISSO testing, etc. It's almost impossible for a project of any significance.
On top of that, they'll have to do it with a budget they don't have, and with resources allocated elsewhere. The only way the government really gets anything done is by committing large amounts of resources to it in an uninterrupted fashion. They don't have the capacity to be agile, and to some extent, that's by design.
15 years ago I contracted with several large federal agencies. Back then, we were pushing for the same thing. It never flew.
I imagine after 15 years they may have a chance at this, but I would caution those of you who have never worked in huge government IT shops to take this with a grain of salt. The situation is so bad in many places that Congress has been passing laws making it illegal for the federal systems not to behave in a certain way. And still things are broken. We passed the point of desperation many years ago.
Big IT in general is broken, and government IT is the most dysfunctional of any IT on the planet. I remain hopeful that this executive order can accomplish something, but I'm not holding my breath on it. Hopeful is one thing. Excited like this guy is? Not at all. Maybe in another 15 years. Maybe.
Exposes a single REST endpoint. POST /UploadAllUserData.
Meh, they're evil, it's probably SOAP and you have to discover it.
edit: Sorry guys! Didn't realize a tongue-in-cheek comment about the NSA have an API was so super-serious! It's like Oprah in here giving out downvotes anymore. "It's a free downvotes for you and you and you."
In enterprises where departmental data has been opened up through APIs, like Wells Fargo and Amazon, there have been tremendous benefits.
In the case of Amazon, this was achieved by CEO fiat, and strongly tied to employee evaluation. (To the point where employees in groups that failed to do so would have been evaluated right out of the company.) I wonder if POTUS has this kind of power over the federal bureaucracy.
Also, I would wonder if this is to be done securely.
As a programmer/contractor for DHS, I'd love to hear what you all think would be a useful set of APIs for DHS to make public. It's all fine and dandy to say 'oh yeah we have an API' but it needs to be something useful. So what would you want to see? Financial/Budget type data? Performance metrics across the different Components within the Department? What?
How about TSA budget, # of TSA employees, # of terror plots stopped, some ratios between those things, # of nail clippers confiscated per employee, etc.
In rudimentary terms, I suppose at least it's a step in the right direction. It doesn't necessarily suggest the US government is going to immediately embrace openly disseminating its data, but it's still a step in the right direction. Third party services will most likely proliferate quite rapidly.
It'll be interesting if they decide to use NIEM (National Information Exchange Model) as a way to transfer information to the public as well. https://www.niem.gov/Pages/default.aspx
It's hard to imagine some federal agencies being able to do much of anything within 90 days. But, I look forward to poking around with some of these APIs.
[+] [-] polemic|14 years ago|reply
I couldn't agree more. APIs are great, but are not the key to open government, for two reasons:
1. They don't provide simple and easy access for non technical individuals into raw information.
APIs shouldn't exist for querying historical datasets if the dataset is not already available in a static format. Release the data, then build an API if there is demand (or the private sector doesn't do it, better, for you).
2. Historical data access is poorly served by APIs.
There is no such thing as a good 'general use' API[1]. API's are appropriate for specific service based transactions that involve some level of processing. Examples:
3. Bonus reason: government agencies suck at building APIs.They're not good at determining what is genuinely high value to end users, they tend to prefer visible projects that can justify budget increases, over genuinely useful, but less easily communicated ones (cf. the US national highway system and pork barrel politics), and there is an entire industry of enterprise companies heavily invested in keeping it this way.
TL;DR Release the data, let users build the APIs. Everyone wins.
Bootnotes:
[1] I lie. That's exactly what publishing raw data at stable URLS on a website achieves.
[+] [-] kjhughes|14 years ago|reply
Consider that an API might provide access to "raw" data directly.
I'd ask not for data over APIs but for more universally useful properties such as stability, currency, consistency, etc.
[+] [-] incongruity|14 years ago|reply
Instead, a good API makes government into a platform for free (and paid) services to be built to deliver that data in innovative ways. Examples of this are starting to appear in places like Chicago which has opened up a lot of data access – for things including transit (bus tracking, etc.) and a lot more. Giving hackers platforms to innovate will definitely yield better results than just throwing gobs of data at the general public. (Never mind that not all raw data is created equal or that raw data also requires savvy people to distill.
It also means that it's potentially going to be easier for one unit of government to interact with (or at least query) another. That may be big as well.
[+] [-] djKianoosh|14 years ago|reply
Only recently have the govies started thinking about how their data can be useful to the general public. In the past everything had been stovepiped and guarded with peoples' live(lihood)s. Hopefully this makes them think about data integrity throughout the life of that data.
Think about how data used to be provided to the public before. A bunch of government folks had to collect data and make sense of it themselves and put it together in a report destined for congress. It's waaaaay different to just provide that raw data to the public. Rather, I think what we'll see is more sanitized data sets, after they've been internally analyzed and vetted (probably multiple times). Not exactly transparent.
But I hope one day, after many iterations of API building, we'll get to a point where the data truly is transparent.
[+] [-] jbooth|14 years ago|reply
[+] [-] grandalf|14 years ago|reply
If a president could have a meaningful impact on this sort of thing, it would be in setting a high bar for the quality of information released by agencies. Any sort of requirement of this kind is completely absent from the announcement.
So rather than being about transparency as it's being touted, the announcement is a celebration of high tech obfuscation. Soon the same sort of insulting, opaque, useless information spouted by officials in press conferences will be available via HTTP. This is at best a neutral day for democracy.
[+] [-] mattdeboard|14 years ago|reply
[+] [-] pjscott|14 years ago|reply
How useful this is depends on how clear the data is, how well they document things, how sane their document formats are -- in other words, it depends on things that are much harder to mandate than just "have an API". I'll predict in advance that most of the APIs here will be pretty half-assed.
[+] [-] sherwin|14 years ago|reply
While this might not set a "high bar for the quality of information," the President's effort shows a level of commitment to both technological streamlining of government agencies and to transparency and is, at the very least, a step in the right direction.
[+] [-] saraid216|14 years ago|reply
[+] [-] recoiledsnake|14 years ago|reply
And for those cases, the agency can still justify why the CSV is better. There are always ways to get around the rules, see Section 508 rules for handicapped users.
>If a president could have a meaningful impact on this sort of thing, it would be in setting a high bar for the quality of information released by agencies. Any sort of requirement of this kind is completely absent from the announcement.
This is a step in the right direction, once the data is more accessible, the "users" (developers) can request for better data. The /DigitalStrategy page requirement is really good in my opinion and will make things simpler instead of the mishmash of sites buried in menus and behind authentication walls.
>So rather than being about transparency as it's being touted, the announcement is a celebration of high tech obfuscation. Soon the same sort of insulting, opaque, useless information spouted by officials in press conferences will be available via HTTP. This is at best a neutral day for democracy.
Whoever said this was about transparency in government? It's not about transparency per se.This is more about making information easier to access and find. In many cases agencies already have APIs, Webservices, data dumps, but they're really buried. How is making them more visible neutral?
[+] [-] kjhughes|14 years ago|reply
If we had to wait for higher-level, coordinating standards first, progress might never come.
[+] [-] waffle_ss|14 years ago|reply
[1]: http://sunlightfoundation.com/blog/2012/06/01/bulk-access-de...
[+] [-] lnguyen|14 years ago|reply
[+] [-] jroseattle|14 years ago|reply
3 months to get a "machine-readable" status report on implementing an API?
Then, complete the implementation in 12 months?
If it takes 3 months for an agency to get a status report up, how long will it take them to implement said API? Government work, sheesh....
[+] [-] cperciva|14 years ago|reply
2 weeks for the director of each agency to delegate someone to be responsible for this. 2 weeks for said responsible person to figure out what an API is. 6 weeks for them to go around to everybody in the agency asking "are you doing any APIs yet?". 3 weeks to take the feedback and turn it into a semi-coherent report.
If anything, I think 3 months is optimistic.
[+] [-] dllthomas|14 years ago|reply
[+] [-] kprobst|14 years ago|reply
[+] [-] rokhayakebe|14 years ago|reply
[+] [-] mrpollo|14 years ago|reply
[+] [-] sp332|14 years ago|reply
[+] [-] pwg|14 years ago|reply
What will this bring? Well, the US govt has X agencies. The result of this decree will be that, within 12 months, all of the public will get to enjoy the thrills of having X incompatible web API's, one unique one per agency.
[+] [-] pmb|14 years ago|reply
[+] [-] EricR23|14 years ago|reply
[+] [-] josh5555|14 years ago|reply
[deleted]
[+] [-] codeonfire|14 years ago|reply
Decision makers are often excited about technology but don't really get the ground level experience. They want to do all the things...on a roadmap...with milestones. Mobile has to be involved in some way.
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] bmelton|14 years ago|reply
The issue is far more complicated than the comments I see in here are giving credit for. Don't get me wrong, there's going to be delay as the PHBs get themselves wrapped around what an API even is, but they'll have the directive routed to their CIOs before that, and they will understand the requirement, and how impossible it is.
The biggest issue is that the data isn't really owned by the government entities. I mean, the data is theirs, but it's locked up in their vendor provided tools, and/or their custom, built-by-vendor products. If they're using Oracle AquaLogic (or whatever it is now) to host the majority of their portal content, they're dependent on Oracle to either come in and show them how to implement the feature (which is a significant service dollar cost) or they're going to have to wait until Oracle builds the ability for API exposure into the product if it doesn't exist yet.
If they've got custom-built portals, they'll need to consult with the vendors who wrote them or maintain them now and get them to add that in. That means that they'll have to modify the contract originally bid for the project, which is going to eat up a couple months of the timeline alone. Then they'll have to figure out what sort of things actually make it into the API, how to segment sensitive data reliably, get it through ISSO testing, etc. It's almost impossible for a project of any significance.
On top of that, they'll have to do it with a budget they don't have, and with resources allocated elsewhere. The only way the government really gets anything done is by committing large amounts of resources to it in an uninterrupted fashion. They don't have the capacity to be agile, and to some extent, that's by design.
[+] [-] intended|14 years ago|reply
I assume that this will lead to some discussion on API standards, as multiple agencies simultaneously realize the implications.
[+] [-] DanielBMarkham|14 years ago|reply
I imagine after 15 years they may have a chance at this, but I would caution those of you who have never worked in huge government IT shops to take this with a grain of salt. The situation is so bad in many places that Congress has been passing laws making it illegal for the federal systems not to behave in a certain way. And still things are broken. We passed the point of desperation many years ago.
Big IT in general is broken, and government IT is the most dysfunctional of any IT on the planet. I remain hopeful that this executive order can accomplish something, but I'm not holding my breath on it. Hopeful is one thing. Excited like this guy is? Not at all. Maybe in another 15 years. Maybe.
[+] [-] gresrun|14 years ago|reply
[+] [-] ConstantineXVI|14 years ago|reply
[+] [-] drivebyacct2|14 years ago|reply
Meh, they're evil, it's probably SOAP and you have to discover it.
edit: Sorry guys! Didn't realize a tongue-in-cheek comment about the NSA have an API was so super-serious! It's like Oprah in here giving out downvotes anymore. "It's a free downvotes for you and you and you."
[+] [-] Radzell|14 years ago|reply
[deleted]
[+] [-] stcredzero|14 years ago|reply
In the case of Amazon, this was achieved by CEO fiat, and strongly tied to employee evaluation. (To the point where employees in groups that failed to do so would have been evaluated right out of the company.) I wonder if POTUS has this kind of power over the federal bureaucracy.
Also, I would wonder if this is to be done securely.
[+] [-] djKianoosh|14 years ago|reply
[+] [-] malyk|14 years ago|reply
;)
[+] [-] ColinWright|14 years ago|reply
[+] [-] mcrider|14 years ago|reply
[+] [-] domwood|14 years ago|reply
[+] [-] gregors|14 years ago|reply
[+] [-] sidwyn|14 years ago|reply
[+] [-] jakejake|14 years ago|reply
[+] [-] cjoh|14 years ago|reply
[+] [-] isamuel|14 years ago|reply