Zanzibar: Consistent, Global Authorization System

[+] argd678|6 years ago|reply

The distinguishing feature I see compared to other systems is the ACL ordering and consistency, which is indeed difficult to do at scale. Looks like Spanner is doing most the heavy lifting, great use case for the database.

[+] usaar333|6 years ago|reply

Well even more broadly it is how generalizable it is, while still providing ordering guarantees (though not necessarily perfect ones.. see my long sibling post)

Using Windows style ACEs for ACLs is also perfectly scalable and consistent, (and more performant) so long as users don't end up in too many groups and objects only inherit ACLs from objects on the same shard. It's just no where as generalizable as Zanzibar which allows much more complex dependencies.

There's always tradeoffs! But this is the best system I've seen for the general ACL evaluation against non recently updated objects.

[+] cryptonector|6 years ago|reply

If you don't have negative ACL entries then ordering is not important.

[+] iampims|6 years ago|reply

“Zanzibar scales to trillions of access control lists and millions of authorization requests per second to support services used by billions of people. It has maintained 95th-percentile latency of less than 10 milliseconds and availability of greater than 99.999% over 3 years of production use”

Impressive!

[+] wallflower|6 years ago|reply

> There's also a story behind that project name. That is not the original project name. The original project name was force-removed by my SVP. Once my hands are free again, I can explain

https://mobile.twitter.com/LeaKissner/status/113663143751427...

[+] pronoiac|6 years ago|reply

https://twitter.com/LeaKissner/status/1136691523104280576

> Zanzibar was not the original name of the system. It was originally called "Spice". I have read Dune more times than I can count and an access control system is designed so that people can safely share things, so the project motto was "the shares must flow"

[+] delroth|6 years ago|reply

https://mobile.twitter.com/LeaKissner/status/113669152310428...

[+] koboll|6 years ago|reply

Yeah, that's kinda understandable. Although I'm not sure how the current name fits the project any better.

[+] dsagent|6 years ago|reply

I was hoping it was a Metal Gear solid reference.

https://metalgear.fandom.com/wiki/Zanzibar_Land

[+] victor106|6 years ago|reply

What do other large (non-google scale) to medium companies use for authorization? Can anyone recommend open source (preferably) or close source products?

[+] usaar333|6 years ago|reply

Excellent paper. As someone who has worked with filesystems and ACLs, but never touched Spanner before, I have some questions for any Googler who has played with Zanzibar. (in part because full-on client systems examples are limited)

A check my understanding: Zanzibar is being optimized to handle zookies that are a bit stale (say 10s) old. In this case, the indexing systems (such as Leapord) can be used to vastly accelerate query evaluation.

Questions I have (possibly missed explanations in the paper):

1. If I understand the zookie time (call it T) evaluation correctly, access questions for a given user are effectively "did a user have access to a document at or after T"? How in practice is this done with a check() API? The client/Zanzibar can certainly use the snapshots/indexes to give a True answer, but if the snapshot evaluation is false, is live data used (and if so by the client or Zanzibar itself?)? (e.g. how is the case handled of a user U just gaining access to a group G that is a member of some resource R?)

2. Related to #1, when is a user actually guaranteed to lose access to a document (at a version they previously had access to?) E.g. if a user has access to document D via group G and user is evicted from G, the protocol seems to inherently allow user to forever access D unless D is updated. In practice, is there some system (or application control) that will eventually block U from accessing D?

3. Is check latency going to be very high for documents that are being modified in real time (so zookie time is approximately now or close to now) that have complex group structures? (e.g. a document nested 6 levels deep in a folder where I have access to the folder via a group)? That is, there's nothing Zanzibar can do but "pointer chase", resulting in a large number of serial ACL checks?

4. How do clients consistently update ACLs alongside their "reverse edges"? For instance, the Zanzibar API allows me to view the members of a group (READ), but how do I consistently view which groups a user is a member of? (Leapord can cache this, but I'm not sure if this is available to clients and regardless it doesn't seem to be able to answer the question for "now" - only for a time earlier than indexed time).

Or for a more simple example, if I drag a document into a folder, how is the Zanzibar entry that D is a child of F made consistent with F's views of its children?

E.g. can you do a distributed transaction with ACL changes and client data stored in spanner?

5. It looks like the Watch API is effectively pushing updates whenever the READ(obj) would change, not the EXPAND(object). Is this correct? How are EXPAND() changes tracked by clients? Is this even possible? (e.g. if G is a member of some resource R and U is added to G, how can a client determine U now has access to R?)

[+] ruomingpang|6 years ago|reply

Excellent questions. We have encountered all of them in practice and have solutions for most of them, e.g., (4) requires an ACL-aware search index.

Unfortunately we don't have enough space to explain them in the paper. Please consider coming to Usenix. :-)

[+] sa46|6 years ago|reply

Used to be a Googler and worked on an ACL model built on top of Zanzibar. I didn't work directly on Zanzibar so listen to ruomingpang over me.

> 3. There's nothing Zanzibar can do but "pointer chase", resulting in a large number of serial ACL checks?

Zanzibar enforced a max depth and would fail if the pointer-chasing traversed too deeply. Zanzibar would also fail if it traversed too many nodes.

> 4. How do clients consistently update ACLs alongside their "reverse edges"?

One of the recommended solutions was to store your full ACL (which includes a Zookie) in the same Spanner row of whatever it protected. So, if your ACL is for books, you might have:

    CREATE TABLE books (
      book SERIAL PRIMARY KEY
      acl ZanzibarAcl
    );

Alternately, you could opt to only store the current zookie instead of the full ACL. Then the check becomes:

1. Fetch Zookie from Spanner

2. Call Zanzibar.Check with the zookie

> but how do I consistently view which groups a user is a member of?

I remember this as a large source of pain to implement. Zanzibar didn't support this use-case directly. As rpang mentioned in a sibling comment, you need an ACL-aware index. Essentially, the algorithm is:

1. Call Zanzibar.Check on all groups the user might be a part of.

There's a bunch of clever tricks you can use to prune the search space that I don't know the details of.

[+] sb8244|6 years ago|reply

How would you deal with questions like "provide all content accessible to a user" in a system like this? Would you watch and replicate to your own database?

[+] ruomingpang|6 years ago|reply

You will need an ACL-aware index, which is one of the main use cases of Zanzibar.

[+] eximius|6 years ago|reply

Semi-off topic: What is the latest and greatest in authorization mechanisms lately?

I like capability-based at on OS level, but sadly I'm not doing anything that interesting. For things like webapps, is there anything better than ACLs or Role-based. Or at least any literature talking about them? Probably overkill for the application I work on, but it'd be nice to take inspiration from best practices.

[+] ubercow|6 years ago|reply

Semi-off topic but is there a curated "best of" list for systems papers like this that anyone knows about, from Google or otherwise?

[+] bretthardin|6 years ago|reply

If you find this, please let me know.

[+] zeeed|6 years ago|reply

I got stuck on the first line in the abstract:

> Determining whether online users are authorized to access digital objects is central to preserving privacy.

Can someone dissect that sentence and explain why that is? I honestly fail to make the connection.

[+] Dowwie|6 years ago|reply

Replace "digital object" with "a PDF of your checking account transactions for 2018". You want to control who can do what with that PDF. Your privacy is at stake.

[+] unknown|6 years ago|reply

[deleted]

[+] cryptonector|6 years ago|reply

This reminds me I need to get my authz paper published, and now sooner than later...

I've built an authz system that is built around labeled security and RBAC concepts. Basically:

  - resource owners label resources
  - the labels are really names for ACLs in a directory
  - the ACL entries grant roles to users/groups
  - roles are sets of verbs

There are unlimited verbs, and unlimited roles. There are no negative ACL entries, which means they are sets -- entry order doesn't matter. The whole thing resembles NTFS/ZFS ACLs, but without negative ACL entries, and with indirection via naming the ACLs.

ACL data gets summarized and converted to a form that makes access control evaluation fast to compute. This data then gets distributed to where it's needed.

The API consists mainly of:

  - check(subject, verb, label) -> boolean
  - query(subject, verb, label) -> list of grants
    (supports wildcarding)
  - list(subject) -> list of grants
  - grant(user-or-group, role, label)
  - revoke(user-or-group, role, label)
  - interfaces for creating verbs, roles, and labels,
    and adding/removing verbs from roles.

Note that access granting/revocation is done using roles, while access checking is done using verbs.

What's really cool about this system is that because it is simple it is composable. If you model certain attributes of subjects (e.g., whether they are on-premises, remote, in a public cloud, ...) as special subjects, then you can compose multiple check() calls to get ABAC, CORS/on-behalf-of/impersonation, MAC and DAC, SAML/OAuth-style authorization, and more. When I started all I wanted was a labeled security system. It was only later that compositions came up.

Because we built a summarized authz data distribution system first, all the systems that have data will continue to have it even in an outage -- an outage becomes just longer than usual update latencies.

check() performance is very fast, on the order of 10us to 15us, with no global locks, and this could probably be made faster.

check() essentially look's up the subject's group memberships (with the group transitive closure expanded) and the {verb, label}'s direct grantees, and checks if the intersection is empty (access denied) or not (access granted). In the common case (the grantee list is short) this requires N log M comparisons, and in the worst case (the two lists are comparable in size) it requires O(N) comparisons. This means check() performance is naturally very fast when using local authz data. Using a REST service adds latency, naturally, but the REST service itself can be backended with summarized authz data, making it fast. Using local data makes the system reliable and reliably fast.

query() does more work, but essentially amounts to a union of the subject's direct grants and a join of the subject's groups and the groups' direct grants.

special entities like "ANYONE" (akin to Authenticated Users in Windows) and "ANONYMOUS" also exist, naturally, and can be granted. These are treated like groups in the summarized authz data. We also have a "SELF" special entity which allows one to express grants to any subject who is the same as the one running the process that calls check().

[+] galaxyLogic|6 years ago|reply

Cool. Keep us posted

[+] 1023bytes|6 years ago|reply

Why is it called Zanzibar though? I'm kind of intrigued

[+] pronoiac|6 years ago|reply

There's another thread about naming it - https://twitter.com/LeaKissner/status/1136691523104280576

The original name was Spice, which was nixed from a higher-up; they went to Zanzibar, one of the Spice Islands.

[+] GMLOOKO|6 years ago|reply

A

[+] sonnyblarney|6 years ago|reply

What's interesting to me here is not the ACL thing, it's how in a way 'straight forward' this all seems to be.

It's the large architecture of a fairly basic system, done I supposed 'professionally'.

I'm curious to know how this works organizationally. What kind of architects involved because this system would have to interact with any number of others, so how do they do requirements gathering? Do they just 'have experience' and 'know what needs to be done' or is this something socialized with 'all the other teams'?

And how many chefs in that kitchen once the preparation starts? Because there's clearly a lot of pieces. Do they have just a few folks wire it out and then check with others? Who reviews designs for such a big thing?

Or was all of this developed organically, over time?

[+] delroth|6 years ago|reply

Zanzibar is basically the brainchild of a Bigtable Tech Lead + a Principal Engineer from Google's security and privacy team [1]. This led to a very sound and robust original design for the system. But it also greatly evolved over time as the system scaled up and got new clients with new requirements and new workloads.

[1] https://twitter.com/LeaKissner/status/1136626971566149633

[+] the-rc|6 years ago|reply

Especially at Google, you first see the same problem appearing and getting solved in multiple products, then someone tries to come up with a more generic solution that works for most projects and, just as importantly, can serve more traffic than the existing solutions. Having to rewrite things on a regular basis because of growth is painful, but can also be a blessing in disguise.

Who that someone is who works on the generic solution, can vary. Sometimes it's one or more of the teams already mentioned. Sometimes, like in this case, it's someone with expertise in related areas that takes the initiative. And a project of this scope invariably gets reviewed on a regular basis by senior engineers, all the way to Urs (who leads all of technical infrastructure). Shared technologies require not just headcount to design and write the systems, but also to operate them (by SREs when they're large enough), so you need to get upper management involved as well.

[+] usaar333|6 years ago|reply

The system is actually pretty complicated and nonobvious once you consider its caching layers, heavy reliance on spanner, assumption that ACL read times can be stale, and the various assumptions and limitations in the namespace controls.

The underlying model of role based access control (and viewing groups as just other resources with ACLs) is already well known.

[+] shereadsthenews|6 years ago|reply

That’s how you design at this scale: keep it simple, don’t be a jackass. If the result looks complicated from the outside, you blew it.

[+] colesantiago|6 years ago|reply

I love reading about Google's systems, but I wish I could work on those problems at scale, that is my dream really. I wonder what more systems Google has that we don't know about.

I know Borg has become what we know as k8s but surely there must be more things that Google has made internally that are not open source.

Curious about this and would like to know more about it from anyone in the trenches at Google.

[+] rifung|6 years ago|reply

> I love reading about Google's systems, but I wish I could work on those problems at scale, that is my dream really. I wonder what more systems Google has that we don't know about.

I work for Google and I used to have this exact thought too. I think the reality is not quite as rosy, though far from bad!

You have to realize that there are hundreds of people who work on systems like this, and as a consequence, your day to day work is more or less the same as what you would do on systems of a smaller scale.

Before I joined Google I always wondered what things they did differently and what magical knowledge Googlers must have possessed. After joining I realized that while on average the engineers are definitely more capable than other places I've worked, there's no special wisdom and instead they just have more powerful primitives/tools to work with.

Of course, maybe I am mistaken and just don't know of the magic?

[+] gregorygoc|6 years ago|reply

The harsh truth of working at Google is that in the end you are moving protobufs from one place to another. They have the most talented people in the world but those people still have to do some boring engineering work.

[+] anonygler|6 years ago|reply

The most impressive part about Google is how its emphasis on internal standards has allowed it to build some really impressive stuff.

Eg, You can do a sql join on any dataset, in any datacenter. You can turn any query into a hosted visualization.

Every test invocation is streamed to a central server and results can be shared with a url.

There’s more, but those are my two favorites.

[+] idlewords|6 years ago|reply

Wait until you get a glimpse into the exciting world of real-time ad bidding. It's every engineer's dream!

[+] unknown|6 years ago|reply

[deleted]

[+] malicioususer11|6 years ago|reply

[deleted]

[+] nippoo|6 years ago|reply

As a side-note: 95th percentile latency statistics are pretty meaningless at this scale. With a million requests per second, a 95th percentile latency of 10ms still means that 50,000 requests per second are slower than that.

[+] demarq|6 years ago|reply

Not sure how I feel about adopting a countries name for a project.

Or more to the point I'm not sure how I would feel if every time I searched my countries name on the web this Google project appears rather than my actual country.

i.e Zanzibar is a national identity not just a "spice" island

[+] NameOfTeam|6 years ago|reply

To be clear, Zanzibar is neither a country nor a national identity. It’s a semi-autonomous region of Tanzania.

[+] ocdtrekkie|6 years ago|reply

Countries who have the Amazon rainforest within their borders are still a little annoyed about the company. https://mashable.com/article/amazon-domain-name-icann-approv...

[+] stingraycharles|6 years ago|reply

Am I alone in thinking that 99.999% measured availability for a service so completely in the critical path for almost everything is relatively low?

Phrased another way, when it is not availability, do end users experience service disruption, and if not, how is that mitigated?

[+] gtirloni|6 years ago|reply

I think you might be alone. 5.26 minutes of down time per year is beyond excellent for any moderately complex system.

[+] jacques_chester|6 years ago|reply

You may have missed how they're defining it:

> We define availability as the fraction of “qualified” RPCs the service answers successfully within latency thresholds: 5 seconds for a Safe request, and 15 seconds for a Recent request as leader re-election in Spanner may take up to 10 seconds. ... To compute availability, we aggregate success ratios over 90-day windows averaged across clusters. Figure 5 shows Zanzibar’s availability as measured by these probers. Availability has remained above 99.999% over the past 3 years ofoperation at Google. In other words, for every quarter, Zanzibar has less than 2 minutes of global downtime and fewer than 13 minutes when the global error ratio exceeds 10%.

Basically, they're counting by number of requests. That's fairly typical for Google, who in their SRE book point out that measuring only total outages is a poor indicator of actual user experiences. Imagine if you had an electric company that had frequent brownouts and rolling blackouts but bragged about never having a total blackout. You'd be fairly unimpressed.

Google SREs also make the point that beyond five nines, your efforts are rendered moot by reliability issues you cannot control. Mostly network issues. If you have 99.99999% reliability but the mobile data network only has 99.99%, you've wasted a lot of money on something most folks will never notice.

[+] bradleyjg|6 years ago|reply

Overall uptime isn't the only stat that matters here, the distribution of downtime matters too. One 15 minute outage in three years is a lot worse than 900 1 second outages over that same time period. One second blips are a part of the web, we click refresh and move on--not even knowing who's fault it was.

[+] GauntletWizard|6 years ago|reply

It says greater than 5 nines, and it's usually much greater - in usual times, these core services are usually at six or seven nines as measured client side. But it doesn't take long at three nines to destroy your five nine SLA.

The other portion is client side retry logic. It's incredibly easy for developers to mark a lookup with a retry policy and timer, and one of the reasons that that latency is so low is so that even if there's a timeout, the pageview can succeed. The application code doesn't see the error at all if the retry is successful, it just takes longer. The retry code is very good and it's already known at the first rpc call where the retry should go - the connection pool maintains connections to multiple independent servers.

[+] xyzzy_plugh|6 years ago|reply

You might not be alone but five nines is pretty good.

I seen many internal facing teams across many companies have SLOs of four nines or less. Five is pretty rare.

[+] lclarkmichalek|6 years ago|reply

It kinda depends what availability means? That .001% unavailability might be degraded service, might be .001% of clients having a bad time across the entire year, might be 'acts of god' (i.e. broken CPUs and the like). This kind of service is also usually fairly low down on the stack, and higher level applications can usually degrade gracefully. If they couldn't, complex applications such as Google would fail to operate; there's always _something_ broken.

129 comments