Thinking About Glue Code

[+] Nevermark|4 years ago|reply

I also view glue code as going up with the square (n^2) of the number of connections (n) grows between two conceptually connected but implementation-disconnected systems.

But that is hardly the worst of it.

The two system model generalizes to much worse complexity (n^m) relative to the number of connections (n) that involve tight linkage of a multiple number of systems (m >= 2).

In practice, we all avoid m>2 wherever we can, and simply slog along less and less efficiently with m=2 as we try to add new value (or maintain value in the presence of necessary changes).

So the general law is mostly apparent in the broad absence of conceptually simple m>2 solutions, as apposed to their existence as quagmires.

Simple thought experiment to illustrate m>2 solutions:

“Captain! If we just reprogram the nanobot solar system explorer cannon to spread the bots on a tight bezier trajectory matching our enemies detected firing pattern, adjust their transmission frequencies to pass through Romulan dark hull steel and resonate with their traditional diptholamite crystal bathroom fixtures, we can apply a combination frequency comb filter and inverse fractal hash to completely uncloak their vessels in real time!”

“But Chief Engineer, there is but one problem!! While conceptually simple, your elegant solution will take one hundred years to implement! None of those systems were designed together!”

Kaboom. Hissss… Silence

I call the problem of rapidly integrating foreign systems into conceptually simple novel solutions “The Star Trek Rapid Integration Problem.”

Apparently this problem has been solved by the Federation of Planets in our future.

[+] einpoklum|4 years ago|reply

In the future, Star Trek equips its starships with casks of Phlebotium, which is then applied generously to both systems in such situations, allowing them to interact seamlessly:

https://tvtropes.org/pmwiki/pmwiki.php/Main/AppliedPhlebotin...

[+] ipnon|4 years ago|reply

It seems like there most be some group theory formulation of this problem, where each "gluing" can be seen as different symmetries of a software system.

[+] cratermoon|4 years ago|reply

> the real problems stem more from schemas than data formats

Oh, how the turntables turn. Ontologies are hard, and always fail in some way[1]. There are strategies like entity resolution[2] that attempt to take items from dataset A and items from dataset B and figure out if they refer to the same thing, but they are expensive and time-consuming and even when they work, only approximate.

Anyone has ever done a data migration project of any complexity knows that there must be provisions for dealing with data quality issues, especially if effort includes reconciling data from multiple sources.

1 https://oc.ac.ge/file.php/16/_1_Shirky_2005_Ontology_is_Over... 2 https://doi.org/10.1145/3418896

[+] tluyben2|4 years ago|reply

We did a great many of them over the decades; I find the mongoldb ones we encountered the most horrible (even compared to vague 80s + 90s plain text formats as they usually at least had some documentation as to what the developer was storing and how), but in general, we found the most reliable and, when you are used to it, the fastest (in the end) is creating types for everything and convertors from one to the other type. We did try other (more sloppy) ways before it and they always kind of result in bad quality. We have noticed that making (very exact) types for everything when doing complex conversions works well. Most notably, we try to avoid stringy types and add a lot of validations in the deserialisers. You can run conversions 10 years later on new batches and still get understandable errors.

[+] skohan|4 years ago|reply

> There are strategies like entity resolution[2] that attempt to take items from dataset A and items from dataset B and figure out if they refer to the same thing

The more code I have written, the more convinced I become that "general" solutions almost never pay for themselves when you compare the implementation and/or performance complexity required to make them work.

It's almost always better to "just write the damn code" and solve the exact problem you actually have.

[+] Chris_Newton|4 years ago|reply

I’m not sure I buy the N² argument being made here.

If we’re talking about communications between components within a software system then yes, for N components, there are roughly N² possible connections between them. But isn’t managing that complexity so we don’t connect everything to everything Software Design 101? If you design your system using structure like layers and pipelines then you’re effectively reducing that O(N²) connectedness to something more like O(N).

If we’re talking about having multiple choices for how to implement each component, such as when we have several libraries available plus the option of writing something from scratch, then given N choices for the first component and M choices for the second, there are NM possible connections we might have to implement. However, we probably only need one of those connections at any given time. If we need to modify or replace either implementation later, the most important thing for flexibility and future-proofing is that the data models and communications protocols are well specified, ideally following established standards that are likely to be followed by the other alternatives as well. Again, though, separating interface from implementation is Software Design 101.

So it seems to me that if either of those interpretations was intended, we’re talking about a problem that has largely been solved for a long time. Perhaps part of the reason it doesn’t always feel that way today is that there is so much emphasis on doing everything in “agile” fashion now. Sometimes that agility comes at the expense of establishing a well-considered software architecture and long-lived standards for interoperability. The “glue code” in software systems is where we pay any interest on those forms of technical debt, so while there will always be some need to marshal data around a large system with many components, there will be much, much more code needed if those forms of complexity aren’t well managed.

[+] tobr|4 years ago|reply

I don’t think either article is talking about every component of the system, just the things that do need to be aware of one another. Marcel Weihers original article says:

> Glue is quadratic. If you have N features that interact with each other, you have O(N²) pieces of glue to get them to talk to each other.

And this one says:

> it’s “quadratic”: the glue code is proportional to the square of the number of things you need to glue

Even the most well-designed architecture will not let you only glue things pairwise, and you don’t need a very high N to run into challenges with N^2.

[+] keithnz|4 years ago|reply

came in here to say the much the same. I also see much more that we get "pre glued" things that we then glue a few more things to. After 40ish years of coding, I think the amount of glue we use is about the same. It goes up and down a bit but there seems to be a natural limit to the complexity we can deal with.

[+] Kinrany|4 years ago|reply

The simplest solution to having N data types and N^2 conversions between them is to have a canonical data type and 2N conversions between that one type and the rest.

The real problem is that all that glue code is proprietary, and companies cannot cooperate on a large scale to pool their resources and share their software.

[+] armchairhacker|4 years ago|reply

> Glue isn’t some kind of computational waste; it’s what holds our systems together. Glue development is software development.

My take is: glue code is just “obvious” code, as opposed to the less obvious code.

The problem is that it’s not really obvious. That’s how you get “write-only” code. What’s obvious to one person isn’t to another, or even to that same person later on. This is why Java, a very verbose (“full of glue”) language, is so popular.

People say “an ideal programming language would have no glue code, it would all be inferred by the compiler”. And it’s true that a good language and libraries can remove a lot of trivial boilerplate. But not all of it, because some of that “boilerplate” isn’t so trivial.

Like, casting floats to integers. The cast is glue code. C and other languages truncate the float, so it gets floored if positive and ceiled if negative. But do you want that, or do you want to round the float or ceil it always? Or is casting the float to an integer an indicator that you’re using the wrong variable?

—-

I think in the future, we will create AI to write a lot of ”glue” code fast, like GitHub copilot. Also, proven-equivalent (or heuristically 99% predicted equivalent) large-scale refactors. But even in the ideal scenario where we eliminate low-level glue code, we’ll just consider higher-level code “glue code”. Because, all code is glue code, to convert “what I want” to “what computer does”.

[+] tomxor|4 years ago|reply

> as systems become more all-encompassing, the need to integrate with different systems increases.

This is the underlying problem: the perception that programs need to become more all encompassing. It's the ultimate feature creep.

[+] justsomeuser|4 years ago|reply

I thought that line was a mistake too.

Your software is just a tiny amount of code (relative to the whole system) making up the control plane between all the pieces.

I think the author probably meant that to personally write less code you need to leverage more external code (libraries and network API's).

Write less code -> Leverage other peoples code -> More integration with different systems

[+] mark_l_watson|4 years ago|reply

About forty years ago, I did a weekend consulting gig at a lab in San Diego where they had a massive N^2 connection problem of data processing code vs. data formats. I wrote a FORTRAN program to enumerate all combinations and generate a lot of very baroque code. My customer was not happy with the solution until I showed them how simple it was to modify the generating code and regenerate the glue code. Then they liked the approach. If I had written this a few years later, I would have used Lisp, and things would have been even simpler.

[+] SeriousM|4 years ago|reply

One thing that evolved over the years is swagger. It is a defined way to document APIs - the endpoint, return type, returning data structure, arguments, behavior, etc. Microsoft took this as a way to connect different APIs in their logic app (think of workflow definition like an advanced version of ifttt, n8n, ...). It certainly doesn't remove the need for glue code but it reduces the amount of it! On the flip side you have up the control of the program flow by using their framework that handles all the connections between APIs.

[+] skohan|4 years ago|reply

All swagger really is is a tool for formalization of HTTP/JSON interfaces. And that's great! Formalized interfaces are really the best way to reduce the need for glue-code and post-processing, since the parties on both sides can agree on the exact target their shooting for.

[+] Wowfunhappy|4 years ago|reply

The last few sentences of the article capture how I feel about the whole thing:

> Programming is ultimately about gluing things together, whether they’re microservices or programming libraries. Glue isn’t some kind of computational waste; it’s what holds our systems together. Glue development is software development.

Unless you're programming in assembly, aren't you always essentially writing glue code? Your language has some number of predefined assembly instructions which you can call in different ways and connect to fulfill your needs.

[+] dnautics|4 years ago|reply

I think there's a real meaningful difference between glue code and business logic, even if your business logic is created by encapsulating low level operations that are hidden by your PL, the total reductiviness to "glue" is not helpful

[+] jesseschalken|4 years ago|reply

The industry already has solutions for this in the form of schema and interface definitions that generate glue code for you:

- gRPC, protocol bufffers

- OpenAPI, Swagger, JSON Schema

- Apache Thrift

- Apache Avro

- Cap'n Proto

- IDL, Microsoft IDL, Web IDL

- etc

[+] gitgud|4 years ago|reply

Programs become messy, abstractions become leaky, glue code helps fix these problems

[+] ipnon|4 years ago|reply

Does functional programming eliminate glue code? Is the sidewinder motion of the Turing machine the font of quadratic coding jobs?

[+] dhosek|4 years ago|reply

Nope, The author's view of glue code is that most of it is to make the data output by function A a usable input for function B. This varies from the trivial (I'm moving an integer from A to B. Piece of cake as long as A & B both have the same sized integer with the same endianness and we've defined what to do if A is outputting a 64-bit value while B is inputting a 32-bit value and A sends something that doesn't fit in 32 bits) to the nightmarish (a large, poorly documented binary blob with inconsistent protocols) to lots of stuff in-between. Programming is easy. Data is hard.

[+] xedrac|4 years ago|reply

In Haskell, monads are all about composition of types. While the glue code doesn't go away, it stays neatly in the bind operator's definition for a given monad, which makes the rest of the code compose very easily. So no, it doesn't eliminate it, but I think it makes it a lot more manageable.

[+] cerved|4 years ago|reply

No but it tends to enforce patterns that makes it a lot more manageable.

45 comments