_bnmd's comments

_bnmd | 4 years ago | on: US Air Force chief software officer quits

My experience is admittedly entirely from the IC side before I came to Platform One last year. And yes, when it came to ground processing and dissemination of geointelligence products, we absolutely did prove we were doing what we were supposed to be doing, to an extremely exacting standard. I'm talking exact specs on false positive rates of cloud detections. Written explanations of pixel by pixel differences from the last release's test keys traced to an RFC. Verification that something worked equally well over cities, jungles, deserts, ocean, mountains. Sometimes getting dinged for ridiculous attention to detail stuff like accurate multi-tile pixel registration of a mosaic collection over Mount Everest, just to prove it still worked in the worst possible spot on the planet for our topographic earth models.

When you see people above asking "what are the examples where we're a decade or more ahead of the commercial world," satellite imagery is one of the places where we're a decade or more ahead of the commercial world. I don't know if you recall Trump accidentally declassifying a collection on Twitter a few years back and imagery experts skeptical that it actually could have been taken from space given the resolution. That is not even remotely the tip of the iceberg of what we can do. I would not have even believed what we can do until I saw it.

So I actually am used to being in an environment run by technical experts in physics and image science who know exactly what they want, write down clear, specific, and exacting requirements, and actually meeting those requirements.

But you know what? We had the infrastructure to do it. A complete clone of the operational system with a cloned production data flow, putting the next release through exactly the same load, so we could detect the differences and any bugs immediately. Actual continuous integration because there was somewhere we could integrate to. Then I come to Platform One and it's here's an AWS account and some buzzword tech products that say they enable GitOps and CI. Figure out how to use them and stand up your own servers from scratch.

And you know what? I'm a confident person. I perfectly believe I can do that. But not very quickly. And if my five person team is supposed to do design, development, test, operations, and maintenance all by ourselves, it is never going to happen quickly. Process can't save you from underprovisioning of resources. But apparently the government is just not willing to spend money any more. It's hard to see how. It's not like the federal budget is getting any smaller. But I have no idea where that money is going. At least some of that used to be going toward extremely superior radical technology that the public is skeptical of because it's all classified and they don't hear about it, but trust me, it's there.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

The only truly intractable problem I don't see a way around is the interagency nature of some of the requirements. In many cases, they aren't really requirements in the sense of being formally written down anywhere. It's worse in the IC than in the DoD, but present in both, where enterprise services meant to interoperate with systems run by other components have to go through a trial period after releasing to some sort of near ops environment. Every single involved stakeholder has to sign off on this.

Who can solve this? There is no common authority. Theoretically, that is what the DNI was supposed to be for, but they don't have any kind of IT expertise.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

Honestly, best of luck, but it's mostly to do with PKI. The DoD PKI office won't issue code signing certificates for NPE users, so artifact provenance assurance either can't be automated, which means it can't be scaled, or we're forced to use self-signed certificates, which combined with having small teams where everyone is forced to have admin privileges because no one actually owns the infrastructure and provides timely support for change management requests, leaves way too many systems incredibly open to insider threat.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

The issue with FIPS compliance is with Anchore. It's actually fixed in 10.2, but that version is not yet in Iron Bank, so delaying deployment to IL5.

The issue with Iron Bank itself is a lot more structural. We report it every time a specific container breaks, and it eventually gets fixed on a report-by-report basis, but the base issue is they're doing two things completely wrong:

1) The process requiring disconnected builds in the container build stage to avoid pulling in dependencies from the Internet, combined with lack of expertise on the part of people assigned to the container hardening teams in terms of how to actually build software, results in them usually copying down the upstream official container, and making a naive copy of the desired executable from that container into a UBI base image. That happens to work for dynamically linked executables when the upstream and UBI have the same glibc version, usually right after a UBI release. Then it later breaks spectacularly, and when the issue is explained to Iron Bank engineers, they don't understand and simply call it fixed when UBI releases again and glibc is aligned by chance with upstream.

2) They push functional changes to tags. You can pull some <image>:8.4 one day, and the next day the exactly same tag will have different environment variables set, different paths, will have executables removed (notably, the jq image used to include aws-cli for some reason, which it shouldn't have, but once it did, you can't just remove it). Normally, the fix for this is pin to a sha instead of the tag, but Harbor doesn't hold onto the shas once you have republished to the same tag five times, and Iron Bank is continuously rebuilding those tags. We've reported it and Iron Bank technical leadership is at least aware this is a problem, but figuring out how to fix it has never been a priority for them.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

Military radio transmissions are encrypted using hardware-loaded pre-shared keys that never go over a network and are stored in guarded arms rooms behind many fences and thick concrete walls on military installations. Keys are rotated at least daily, often twice daily. Granting that I guess I wouldn't know if China actually managed to break US military encryption, but as far as I do know, our encryption never has been broken. "Classified" information would effectively be useless if so, since we have no choice but to still use the same backbone fiber lines as everyone else to get internetworking working (or radio, which is totally wide open no matter what), and end to end encryption is the only assurance we have that this information is not being intercepted.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

I'm not at all trying to crap on Etsy. I'm just saying these stories about industry leaders that are deploying to prod 100,000 times a day are able to do this because they own applications that can break and it's not really that big a deal. HBO Max, Prime Video, and Netflix pretty consistently just crash every now and again and reboot. Whatever. It comes back up and you can watch your show. Bizarre bugs are at worst minor annoyances. You can't do the same thing with a weapons platform control module. You need to test the hell out of it and know for certain it works, in every possible edge case. Releases to production are heavily gated for a very good reason.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

Yeah. I aint saying who I am, but I'll anonymously corroborate this. Nic had a lot of good ideas, but he micromanaged the hell out of low-level product decisions, forced us to use specific broken products that made me swear the vendors must have naked photos of him somewhere. He threw new requirements out of left field in the middle of demos that were not real requirements written down anywhere, but for whatever reason, we had to change course in the middle of development and do it anyway.

I'm sitting here right now on a Friday afternoon while the Air Force is on a four-day weekend ahead of labor day, trying to deploy to a broken ass application his DevSecOps reference architecture forces everyone to use, but it doesn't work because it uses a checksum algorithm disabled by FIPS-compliance hardening in this environment, which we have absolutely no control over. The biggest impediment to even getting this far was another vendor enterprise service he forced us to use that was broken until July. We were stuck just waiting on a bug fix to be provided. And, of course, we have to use Iron Bank container images for everything, but Iron Bank container images are themselves perpetually broken. They do security hardening, but no functionality testing, and their process of pushing breaking changes to the same tags can break you in production unexpectedly. And you can't pin to the actual sha because Harbor only holds onto five orphaned shas at a time that don't correspond to a tag.

He's touting a lot of accomplishments here that are accomplishments because pushing broken functionality and calling it done is a very easy way to say you delivered faster than a normal DoD program that has to actually prove it does what it says it does.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

Two notes on this:

1) If the military gets it right with anything, it's encryption. This isn't connecting to the aircraft over the Internet using Verisign PKI. You're not gonna man-in-the-middle inject your own code into the update. The only attack vector is the software supply chain itself, but that is already an attack vector regardless of how the software gets loaded.

2) Part of the purpose of being able to do something like this is to push new software capabilities to platforms that can't be brought back to manually do it at all, like satellites in orbit. A software update that doesn't require you to launch a new rocket into space can save billions.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

I feel Nic's pain. Here is the original article about the talk he gave before leaving: https://www.airforcemag.com/air-force-leadership-chief-softw...

> One of Chaillan’s main concerns is incorporating security into software development, a practice known among IT professionals as DevSecOps. With a lack of basic IT infrastructure, implementing DevSecOps has proven difficult, he said. What’s more, there has been some resistance among those used to the more traditional approach of considering security after development and operations.

We're standing up basically everything ourselves from scratch. The mandate was basically "we have a critical need for a new capability. Here is an AWS account and five developers, so make it happen." That's it. So everything from standing up CI/CD pipelines, to building out a cluster, to configuring storage and networking, to writing and testing the application code, to maintaining environments and deployments, is falling on us, with no support.

I'm not going to say what the product is for reasons of OPSEC, but it is inherently a product that has extremely high security needs. Yet in the rush to be able to tell some high-ranking people we have put an "MVP" in production, we've skimped in every which way it is possible to skimp. I am aware of so many holes in the system, but Air Force pen testers didn't find them, so our product manager is being pushed to go forward and we'll worry about security later.

To my mind, this is absolutely unacceptable for a critical defense system, but nobody is asking my opinion. Supposedly, we keep being told we'll lose funding and get the plug pulled if we don't hit some important milestone at some exact date. By being "agile," we can deliver a broken, insecure "MVP" and "iterate" on it until we have a real product that actually meets its requirements.

You can't do this crap with defense systems. This isn't Etsy. Deploying broken shit has far different implications than when all the exemplars from the DevOps Handbook do it in order to find all their bugs in prod and turn their users into beta testers.

_bnmd | 4 years ago | on: US Air Force chief software officer quits

It's not explained, but I know exactly what he means. We mandate that development teams and integrated product teams have to use agile methods, but the procurement process itself is inherently not agile. Contracts come with fixed dollar amounts, milestone delivery dates, and requirements that need at least signoff from senior agency officials to change and possibly acts of Congress. Further, the way your "customer" is always an acquisition office rather than actual users of your system, developers can't receive, solicit, or respond to direct feedback from users, which is a pretty basic core tenet of agile development, without which it's hard to see how it can ever work.

page 1