This is a bit of a mess to read but I don't see anything particularly novel. There is a lot of bad software out there, though, and what the author describes sounds a lot closer to good software. No idea what this has to do with Waterloo though.
A few relevant quotes:
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they’ll be obvious." -- Fred Brooks, The Mythical Man Month (1975)
It's no coincidence that ADTs and data-driven design are commonplace today -- I would even argue they are so commonplace that most programmers are not even aware they are stylistic choices.
Many, many folks get caught up chasing complexity for all the wrong reasons. Define your data, stake your boundaries and just write the damn code.
Maybe not novel, but I think the author is describing an "A-Ha" moment that most programmers have to go through at some point. I think most new programmers don't think about data, they think about goals. "I want this sprite to move from here to there, and when you press a button a laser comes out of the gun" or "I want to make a webpage that lets you create polls for Twitter". They think about the end results and just mess around with data structures until they accomplish their goals. This style of programming is very brittle and breaks as soon as you change your goals. Moving to a mental model where you think about data first, and that your goals are an expression of that underlying data leads to simpler code and more robust architectures that can be expanded as needed.
> This is a bit of a mess to read but I don't see anything particularly novel.
It's okay that someone blogs about an old idea and puts a new name on it. It's still a good thing even if it's obvious to some. The author came to their own understanding of this concept in a particular way and decided to share that way with other people. You don't have to knee-jerk critique them for it.
I worked with a bunch of smarter-than-me UW grads after graduating.
My “how to write large systems” takeaway from that early point in my career was to focus on the interfaces between various parts. What I’d never thought about until now is that is a very data centric viewpoint.
- What system has what data?
- In what shape?
- What shape does the next system need its data in?
- Are the interfaces between these orthogonal? Shallow? Easy to grok? Tight (as opposed to leaky)?
Sounds a lot the style I learned from working through HTPD[1]. Define data types, then pass the data around through functions. You work through it in a dynamic language, but you keep track of all your data types, writing them above your function definition and make sure they line up, kind of like a manual, self-imposed type-checking system.
One thing that amazed me while working through HTDP was that almost all my bugs came from not understanding the data types correctly, or messing something up with the process of manual type-checking. Once I understood the data structures I was trying to pass around and compute with, the bugs almost always melted away.
Now I program basically everything in a language with type checking (mostly in TypeScript) with thinking about data types and type definitions as the foundation. I'm amazed to see how 95% of the pain, complexity, and bugs has just melted away.
Funny how school lessons are not applied in the real world. I worked at a Waterloo unicorn, and we had a single refund function that either authorized or denied a refund, with dozens of parameters in the function header
I'd like to do more of this. but every time i use typescript in react/nextjs i end up hitting breakpoints in transpiled js code instead of the TS source, esp. in my tests/running single-files.. and i've tried so many debug configurations
I went to the University of Waterloo in the mid 2000s and never heard this kind of philosophy.
Still, having quite a bit of experience under my belt, I don’t think it’s actually going to be uniquely helpful at writing code. As I’ve gained more mastery, I’ve started to think more abstractly about the system as a whole. Sure, data flow is one aspect and you should consider. The mechanical aspects of code are necessary to consider too (eg what makes code maintainable and robust against mistakes). But how all the different pieces cooperate to create a complex system that’s solving your problem, that’s the way to get some real insights. Thinking about the system, you can start to think about how to change requirements rather than just trying to solve within some external constraints. Being able to move seamlessly up and down the abstraction stack is hugely important.
I agree 100% that focusing on the code is completely misguided. But so is focusing on data. Data by itself is useless. It’s what you can do with the data and how you can use it. Just shuttling bits around is by itself pointless unless all you’re building is basic data viz. And ultimately, this by itself is only one approach. For example, AI systems depend on data cleaning today. That’s not at all about how you shuttle data around not will that perspective help you. ML systems depend on more scientific rigor approaches. A data perspective might help you optimize the performance of those systems, but that’s a smaller aspect of what AI systems are trying to solve (not unimportant, but smaller than the entire thing itself). Smaller perspectives aren’t bad but they limit the space you can play in (which may be the goal some times but keep that in mind).
All that being said, a system’s level perspective is also limiting. You’re some times not going to have the domain expertise to actually solve some problems by yourself. You want to take on lots of different perspectives and have a good sense of which situation falls for a given perspective more. And some times, you may not have the ability to take a certain perspective. That’s where colleagues can help to complement your weaknesses.
My dad studied CS at UW in the 70s. He told me about using a language called “WATFIV” (pronounced “Wat Five”), which stood for “Waterloo FORTRAN Four.”
Here I was years ago wondering whether the Yourdon formalism or the Gane and Sarson formalism was better for doing your data flow analysis.
It turns out doing dataflow analysis is just pretty much scorned by the programming community so it was moot.
People just want to start coding and get that immediate dopamine hit of positive feedback. The answer to which formalism was better was "Agile" where you don't need to plan or even understand data flow (because it will emerge spontaneously); just write code between Ritalin hits.
One result of this is that there are no good (free) tools out there to support either dataflow analysis formalism. I get blank looks for coworkers when I ask to see the dataflow analysis for their systems.
Of course, I picked up the dataflow analysis thing working at a Waterloo startup as my first post-graduation job. The university I went to focused on data structures and algorithms (the how of software) rather than dataflow (the what of software). My first job taught me that data structures and algorithms are necessary but not sufficient.
Not just another discussion on programming style, but program development, with a focus on data.
I like the idea of having models and enforcing themselves. For example, testing that three different API endpoints of a service match each other's idea of their objects. This is a sanity check when we verify the frontend state.
If we could separate scraping from modeling constraints, we could potentially collect data separately from the verification step. Then we aren't left waiting for UI DOM stuff when we verify the model. The latter can happen separately, and extremely quickly.
At my last job I supported a system that was largely built by one person over the preceding 20+ years. This was roughly his philosophy as well.
He told me, paraphrased since this was 20 years ago, "Whenever I'm designing a new feature, I always look at what needs to come out at the end, then I can figure out what needs to go in at the start and how it has to flow through."
BMath (CS) 2002 here. This description is spot on the way I think about development, and is a bit of a superpower to be able to do well. I'm not totally sure it's a UW-ism though. I can certainly recall a couple of very formative Tompa courses where he impressed the importance of taking a data-first view of design, and I think we had a stronger bias towards data structures than most other schools whose grads I've worked with. But overall I think that sentiment grew weaker in my upper years, when a more conventional algorithms approach took over.
I will say though, that I've also noticed the contrast before with MIT grads, who tend to have a very strong LISP bent to their styles. It's true that each school has their own unique flavour, and much like accents it may just be that you don't notice your own.
0. There is no magic anywhere. Anywhere down a stack, in a system or in code, they are all just bits of code. The behavior lives somewhere. (Genchi genbutsu.)
1. Get the point A. to the point B is dataflow analysis. One can even deconstruct the RTL micro- and macrocode designs of a CPU or GPU this way. Input, process, output, and feedback encapsulate represented behavior be it a shell pipe, streaming IO class, Kafka, firewall, audio effect generator, or microcontroller.
Static compilers try to be efficient data flow analysts with as much liveness and constraint information as possible to apply optimization transformations. It's interesting that static optimization passes act are usually implemented
as middleware patterns that stack.
Waterloo uses the HTDP book to teach freshmen introductory programming and CS. Now, I am sure, there are many students who take CS135 with no knowledge of what programming is. They are taught a functional language without state or mutation.
My question is how do they fare when they are to use imperative languages later on in the CS program where they have to use messy for loops and mutation and memory allocation? Is it better because they did CS135 first or hard?
To be frank, I don't think imperative language use is going away anytime soon. So, they need to learn the best use of both the worlds, hence, asking.
This was touched on a bit in the artical -- A problem that I've always had with OOP (as in class & method syntax) languages. I'm always worrying about the functions (i.e. code) too much. There is this tendency to abstract from the data prematurely, ending up with many small data silos that talk to each other without real understanding what's happening in the global picture. That's often not a great way to design data structures. Of course, syntax is just syntax, and in most languages the OOP part is only optional, but for me this has been a real effect. I've had more success just constraining myself to procedural.
This is a bit unrelated, but does anyone know how Waterloo became one of the best schools for Math/CS in Canada? I'm from Canada myself and almost went to Waterloo, and its always been a bit weird that Waterloo, a school that isn't really known for anything, doesn't have a long history, and is located in a random town unconnected to large firms, banks, and/or other universities (like the Bay area) has become so good for CS. Does anyone know the history behind it?
You have it a bit upside down. The 'Bay Area' doesn't get good students because of the Valley, it's the other way around. The Valley was made by good students from Stanford and Berkley.
Unis have always been a bit out of the way, they are not 'sponsored by banks' they were sponsored by Churches and congregations, then the elite.
Waterloo was Waterloo College, a Lutheran Seminary, and grew out from there.
It was successful probably because it was very much focused on Tech, unlike most other schools, and didn't appeal to multi generational families, but 'anyone'. The local mennonites are also extremely good students, you don't hear about them, but they get good grades.
It's a great tech school, but one of the ugliesst, most sparse and uninspiring campuses imaginable. If we think of traditional Uni like an 'Ivy Campus' or 'Oxford' aesthetically - UW is like one of those 1960's, concrete block kind of Soviet Utilitarian places. I mean it could be worse.
There’s a number of policy decisions they made to sacrifice the wealth of the school / professors and care deeply about conflicts of interest. For example, any patents you get/inventions you make are your property for professors and students. Professors there are actively allowed to fight textbook fees like teaching from their own material. They’re often prohibited from benefiting if it’s their own book (I think they can give it away at cost or marginal markup). The students are not particularly affluent so there’s a good hacker culture going on (necessity breeds creativity). Engineering exams have a formal exam bank (can’t remember if student run or university sponsored) and give you all historical exams for that subject. This ensures that professors can’t just keep reusing the same material which would otherwise help students cheat by getting previous years exams (vs actually learning the material). There’s a focus on a mix of individual study / evaluation and group work. There’s also the famous co-op program that they pioneered that everyone is trying to mimic that connects them to industry. I think in the CS and maths departments they do a good job training for international competitions to get that prestige up. They also had really talented educators that really cared about getting kids to have fun in the first year (attrition rates would be a lot worse if the brutalism started early).
At this point it’s a reinforcing flywheel just like it is with MIT, Stanford, and Berkeley. I think they went with a different route though. They give minimal scholarships and afaik they don’t go out of their way to recruit wunderkids.
I suspect there isn’t any single answer / magic secret. They just built a good culture centered around teaching kids STEM effectively, kids and parents recognized it quickly enough which created natural competition to get in until it became a flywheel effect.
It's a giant undergrad school (36,000 undergrads and barely 6,000 postgrads). There are more undergrads at Waterloo than at MIT, Stanford, CMU, Caltech, Harvard, Princeton and Yale... Combined! and they do coop, meaning their "break" semester alternates from from summer to fall to winter.
From what alumni told me, undergrads are incentivized to apply everywhere for internships as part of their courses and especially during the off-cycles (winter) for internships when they are effectively the only ones looking. Coop also means someone who can't convince an employer to pay for them won't graduate, so there's a nice selection bias.
Reminds me of being taught hand-cut recursive descent compilers, back in the day. The process was: define what info is required and passed around, then the code becomes (almost) trivial.
Yeah, the majority of programming is data plumbing. Getting it from Point A in Format 1 to Point B in Format 2, and maybe changing a couple of values along the way (which might involve a detour to Point C in Format 3 so you can use that fancy GPU to work on a bunch of them at once or whatever). The important parts of a coding philosophy are, IMO, how easy it is to understand and how easy it is to maintain.
Another beauty of this kind of thinking is that it’s really empowering if you don’t have any computer science background. Coming from somewhere like physics, grasping how computers are actually simple opens very direct paths for getting things done.
This isn't about code but about architecture. It's generally true that if you architect the system well then the code will be relatively easy to write, but it helps to know how to code when you go architect.
[+] [-] xyzzy_plugh|2 years ago|reply
A few relevant quotes:
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they’ll be obvious." -- Fred Brooks, The Mythical Man Month (1975)
It's no coincidence that ADTs and data-driven design are commonplace today -- I would even argue they are so commonplace that most programmers are not even aware they are stylistic choices.
Many, many folks get caught up chasing complexity for all the wrong reasons. Define your data, stake your boundaries and just write the damn code.
[+] [-] seanalltogether|2 years ago|reply
[+] [-] boredtofears|2 years ago|reply
It's okay that someone blogs about an old idea and puts a new name on it. It's still a good thing even if it's obvious to some. The author came to their own understanding of this concept in a particular way and decided to share that way with other people. You don't have to knee-jerk critique them for it.
[+] [-] DanHulton|2 years ago|reply
However, that only further proves your point. Those tables were a mess, and the underlying logic was even worse.
[+] [-] 908B64B197|2 years ago|reply
Me neither.
I lot of these ideas seem to come from SICP which originated at MIT, from earlier research at the AI lab and at Xerox PARC.
[+] [-] kenrose|2 years ago|reply
I worked with a bunch of smarter-than-me UW grads after graduating.
My “how to write large systems” takeaway from that early point in my career was to focus on the interfaces between various parts. What I’d never thought about until now is that is a very data centric viewpoint.
- What system has what data?
- In what shape?
- What shape does the next system need its data in?
- Are the interfaces between these orthogonal? Shallow? Easy to grok? Tight (as opposed to leaky)?
Great article.
[+] [-] elevenoh4|2 years ago|reply
>Shallow?
too many deep inheritance typescript libraries being built today I find!
[+] [-] adamddev1|2 years ago|reply
One thing that amazed me while working through HTDP was that almost all my bugs came from not understanding the data types correctly, or messing something up with the process of manual type-checking. Once I understood the data structures I was trying to pass around and compute with, the bugs almost always melted away.
Now I program basically everything in a language with type checking (mostly in TypeScript) with thinking about data types and type definitions as the foundation. I'm amazed to see how 95% of the pain, complexity, and bugs has just melted away.
[1]: https://htdp.org
[+] [-] srcreigh|2 years ago|reply
[+] [-] Wagthesam|2 years ago|reply
[+] [-] elevenoh4|2 years ago|reply
[+] [-] vlovich123|2 years ago|reply
Still, having quite a bit of experience under my belt, I don’t think it’s actually going to be uniquely helpful at writing code. As I’ve gained more mastery, I’ve started to think more abstractly about the system as a whole. Sure, data flow is one aspect and you should consider. The mechanical aspects of code are necessary to consider too (eg what makes code maintainable and robust against mistakes). But how all the different pieces cooperate to create a complex system that’s solving your problem, that’s the way to get some real insights. Thinking about the system, you can start to think about how to change requirements rather than just trying to solve within some external constraints. Being able to move seamlessly up and down the abstraction stack is hugely important.
I agree 100% that focusing on the code is completely misguided. But so is focusing on data. Data by itself is useless. It’s what you can do with the data and how you can use it. Just shuttling bits around is by itself pointless unless all you’re building is basic data viz. And ultimately, this by itself is only one approach. For example, AI systems depend on data cleaning today. That’s not at all about how you shuttle data around not will that perspective help you. ML systems depend on more scientific rigor approaches. A data perspective might help you optimize the performance of those systems, but that’s a smaller aspect of what AI systems are trying to solve (not unimportant, but smaller than the entire thing itself). Smaller perspectives aren’t bad but they limit the space you can play in (which may be the goal some times but keep that in mind).
All that being said, a system’s level perspective is also limiting. You’re some times not going to have the domain expertise to actually solve some problems by yourself. You want to take on lots of different perspectives and have a good sense of which situation falls for a given perspective more. And some times, you may not have the ability to take a certain perspective. That’s where colleagues can help to complement your weaknesses.
[+] [-] Waterluvian|2 years ago|reply
Now that is Waterloo Style.
[+] [-] Turing_Machine|2 years ago|reply
[+] [-] bregma|2 years ago|reply
I used it professionally for years in the 1980s. It wasn't just a teaching language.
[+] [-] bregma|2 years ago|reply
It turns out doing dataflow analysis is just pretty much scorned by the programming community so it was moot.
People just want to start coding and get that immediate dopamine hit of positive feedback. The answer to which formalism was better was "Agile" where you don't need to plan or even understand data flow (because it will emerge spontaneously); just write code between Ritalin hits.
One result of this is that there are no good (free) tools out there to support either dataflow analysis formalism. I get blank looks for coworkers when I ask to see the dataflow analysis for their systems.
Of course, I picked up the dataflow analysis thing working at a Waterloo startup as my first post-graduation job. The university I went to focused on data structures and algorithms (the how of software) rather than dataflow (the what of software). My first job taught me that data structures and algorithms are necessary but not sufficient.
[+] [-] turtleyacht|2 years ago|reply
I like the idea of having models and enforcing themselves. For example, testing that three different API endpoints of a service match each other's idea of their objects. This is a sanity check when we verify the frontend state.
If we could separate scraping from modeling constraints, we could potentially collect data separately from the verification step. Then we aren't left waiting for UI DOM stuff when we verify the model. The latter can happen separately, and extremely quickly.
See also
Data-Oriented Programming. Sharvit, Yehonathan. Manning 2022.
Principles of Program Design. M.A. Jackson. 1975.
SAM Pattern. Jean-Jacques Dubray. https://sam.js.org/
[+] [-] Mister_Snuggles|2 years ago|reply
He told me, paraphrased since this was 20 years ago, "Whenever I'm designing a new feature, I always look at what needs to come out at the end, then I can figure out what needs to go in at the start and how it has to flow through."
This still sticks with me and serves me well.
[+] [-] mcmatterson|2 years ago|reply
I will say though, that I've also noticed the contrast before with MIT grads, who tend to have a very strong LISP bent to their styles. It's true that each school has their own unique flavour, and much like accents it may just be that you don't notice your own.
[+] [-] andersentobias|2 years ago|reply
Isn't this very data-centric in nature?
[+] [-] 1letterunixname|2 years ago|reply
0. There is no magic anywhere. Anywhere down a stack, in a system or in code, they are all just bits of code. The behavior lives somewhere. (Genchi genbutsu.)
1. Get the point A. to the point B is dataflow analysis. One can even deconstruct the RTL micro- and macrocode designs of a CPU or GPU this way. Input, process, output, and feedback encapsulate represented behavior be it a shell pipe, streaming IO class, Kafka, firewall, audio effect generator, or microcontroller.
Static compilers try to be efficient data flow analysts with as much liveness and constraint information as possible to apply optimization transformations. It's interesting that static optimization passes act are usually implemented as middleware patterns that stack.
[+] [-] elevenoh4|2 years ago|reply
I find it tricky keeping 'data-1st' code my frontend (which is eating more & more of backend as the years pass)
[+] [-] noob_eng|2 years ago|reply
My question is how do they fare when they are to use imperative languages later on in the CS program where they have to use messy for loops and mutation and memory allocation? Is it better because they did CS135 first or hard?
To be frank, I don't think imperative language use is going away anytime soon. So, they need to learn the best use of both the worlds, hence, asking.
[+] [-] jstimpfle|2 years ago|reply
[+] [-] HDMI_Cable|2 years ago|reply
[+] [-] jasmer|2 years ago|reply
Unis have always been a bit out of the way, they are not 'sponsored by banks' they were sponsored by Churches and congregations, then the elite.
Waterloo was Waterloo College, a Lutheran Seminary, and grew out from there.
It was successful probably because it was very much focused on Tech, unlike most other schools, and didn't appeal to multi generational families, but 'anyone'. The local mennonites are also extremely good students, you don't hear about them, but they get good grades.
It's a great tech school, but one of the ugliesst, most sparse and uninspiring campuses imaginable. If we think of traditional Uni like an 'Ivy Campus' or 'Oxford' aesthetically - UW is like one of those 1960's, concrete block kind of Soviet Utilitarian places. I mean it could be worse.
[+] [-] vlovich123|2 years ago|reply
At this point it’s a reinforcing flywheel just like it is with MIT, Stanford, and Berkeley. I think they went with a different route though. They give minimal scholarships and afaik they don’t go out of their way to recruit wunderkids.
I suspect there isn’t any single answer / magic secret. They just built a good culture centered around teaching kids STEM effectively, kids and parents recognized it quickly enough which created natural competition to get in until it became a flywheel effect.
[+] [-] 908B64B197|2 years ago|reply
From what alumni told me, undergrads are incentivized to apply everywhere for internships as part of their courses and especially during the off-cycles (winter) for internships when they are effectively the only ones looking. Coop also means someone who can't convince an employer to pay for them won't graduate, so there's a nice selection bias.
[+] [-] kps|2 years ago|reply
The large firm involved in the origin of UW was Electrohome, one of the top consumer electronics manufacturers of its time.
[+] [-] drpixie|2 years ago|reply
[+] [-] readthenotes1|2 years ago|reply
[+] [-] ranger207|2 years ago|reply
[+] [-] adjav|2 years ago|reply
[+] [-] lachlan_gray|2 years ago|reply
[+] [-] cryptonector|2 years ago|reply