igodard | 8 years ago | on: Mill CPU Inter-Process Communication
igodard's comments
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
Fine granularity is expensive, which is why the monoliths have one process-granularity. If you have 100,000 graph nodes and want to pass all of them except this one then you will have to pay for the privilege, in any protection model. The Mill lets you pay less.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
Like any cache, the optimal PLB size is determined by the working set. In the typical code we are seeing, the program has a couple of open files, half a dozen mmaps where the heap grew itself, and portal blocks for assorted libraries. The working sets are much smaller than a conventional TLB, and with SAS we have several cycles available in parallel with the caches.
The upshot is that a PLB can be large, cool, and slow. As for the range compares, the PLB permits the same sore of address sub-setting as is done in mixed size TLBs. Think about how many bits in the typical address range differ between lower and upper bounds.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
The conversion does increase the latency of getting the value of x. If there's nothing else to do then the tool chain will insert explicit nops to wait for the expression. The same stalls will exist on other architectures for the same code, just not visibly in the code. It happens that making the nops explicit is faster than a stall; you can idle through a nop with no added overhead, but you can't restart a stall instantaneously.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
Of course, modern peripherals don't look like that, so there will be adaptors. IBM 360 channels and CDC6600 PPs also haven't been architecturally revisited in a while.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
And yes, Mill Computing, Inc. is not how real companies are run. Is that a bug or a feature?
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
Most of what you'd like to see are things we'd like to see too. At the beginning we decided to bootstrap rather than follow the usual funding model, at least to the point at which we could demonstrate what we had to people who would understand it in detail. We choose bootstrap in large part because most of us were old enough to have had actual experience with other business models. Yes, it has taken far longer to get this far than we wanted, but we have gotten this far.
About evaluation: it has been our experience that the more senior/skilled a hardware (and software) guy is the more they fall in love with the Mill. You don't hear much of that - we want the tech to be judged on its merits, not on some luminariy's say-so. And of course those senior guys tend to work for potential competitors and don't want to say much publicly.
But you are right: the proof will be running code, and we're starting to do that. We'll be doing more talks like the switches talk, with actual code comparisons. Eventually we will put our tool chain and sim on the cloud for you to play with. Patience, waiting is.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
The difference between the two models is visible when you pass a graph structure across a protection boundary. With caps is is easy to pass the whole graph, and hard to pass only one node. With grants it is vice versa.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
The Mill is hardware and architecture, not policy. If you want to use such an OS then you are free to do so. The Mill is designed to efficiently support micro-kernel OSs. Note: micro-kernel, not no-kernel. There always will be a Resource Service that owns the machine. It will be a couple hundred LOC, small enough to be correct by eyeball or proof. Contrast your choice of monolith.
The OS is not involved in allocating spillets. Spillet space is a large statically-allocated matrix in the address space. It is not allocated in memory, only in the address space. As soon as you allocate a turf id and a thread id you have implicitly allocated a spillet. Only on spillet overflow is allocation necessary. Whether allocating turf or thread ids requires OS involvement depends on the policies and models chosen by the OS designer.
When first created the spillet data lives only in backless cache - no memory is allocated. Only if the spillet lives long enough to get evicted from cache is actual memory allocated, using the Backless Memory mechanism described in our Memory talk. The root spillets of apps will live that long; transient spillets from portal calls will likely live only in cache. Consequently truly secire IPC/RPC using Mill portals has overhead, both app and system combined, of the same magnitude of an ordinary function call.
> They don't provide nearly enough ways to transitively grant permissions. Using the mechanisms discussed in the talk, it doesn't seem like you can implement a simple asynchronous queue of units of work to perform, each having their own permissions.
There is a "session" notion that addresses such things. Unfortunately the talks are far enough into details that they must contain background and introduction slides for the viewers who have not already done (and retailed) all the other talks. This limits the amount of new material that can be covered in a single talk, and sessions didn't make the cut this time. We'll get to them.
> The mechanism to support fork() is a total kludge.
Agreed; there seems to be a Law of Conservation of Kludgery. We had as a minimum requirement that the architecture must support Unix shell. The only real problem is fork(). Would that we could issue an edict banning it.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
The guys are Cambridge have running caps systems that store the extra info in outboard data structures. We judge that the overhead is too great for commercial success. Customers buy benchmarks, and there are no security benchmarks.
'Tis true 'tis, 'tis pity. 'Tis pity 'tis, 'tis true.
igodard | 8 years ago | on: Mill CPU Inter-Process Communication
igodard | 8 years ago | on: The Mill CPU Architecture: Switches [video]
igodard | 8 years ago | on: The Mill CPU Architecture: Switches [video]
Prefetch chaining is to get code out of DRAM, and it runs DRAM-latency ahead of execution. Separately, fetch chaining is to get the code up to the L1, and it runs L2/L3-latency ahead of execution. Load chaining gets the lines from the L1 to the L0 micro-caches and the decoders, and runs decode-latency (3 cycles on the Mill) ahead of execution.
The Mill stages instructions like this because the further ahead in time an action is the more likely that the action is down a path that execution will not take. We don't prefetch to the L1 because we don't have enough confidence that we will need the code to be willing to spam a scarce resource like the L1. But avoiding a full DRAM hit is important too, so we stage the fetching. It doesn't matter at all in small codes that spend all their time in a five-line loop, but that's not the only kind of codes there are :-)
igodard | 8 years ago | on: The Mill CPU Architecture: Switches [video]
Table reload is part of our support for the micro-process coding style. It is essentially irrelevant for typical long running benchmarks, especially those that discard the first billion instructions or so to "warm up the caches".
Table reload provides faster response for cold code (either new processes or running processes at a program phase boundary) than simply letting the predictor accumulate history experience. There are heuristics that decide whether execution has entered a new phase and should table-load; the table is not reloaded on every miss. Like any, the heuristics may be better or worse for a given code.
The historical prediction information is in the load module file and is mapped into DRAM at process load time, just like the code and static data sections. Table-load is memory-to-predictor in hardware and is no more difficult than any of the other memory-to-cache-like-structure loading that all cores use, such as loading translation-table entries to a TLB.
While a newly-compiled load module file gets a prediction table from the compiler, purely as being better than nothing, the memory image from the file is continually updated during execution based on execution experience. When the process terminates, this newly-augmented history is retained in the file, so a subsequent run of the same load module is in effect self-profiling to take advantage of actual execution history. Of course, programs behave differently from run to run and the saved profile experience may be inapt for the next run; there are heuristics that try to cope with that too, although we have insufficient experience as yet to know how well those work. However, we are reasonably confident that even inapt history will be better than the random predictions made by a conventional predictor on cold code.
As always in the Mill, we welcome in the Forum (millcomputing.com/forum) posts of the form "I don't understand how <feature> works - don't you have trouble with <problem>?". Unfortunately, time and audience constraints don't let us go as deep into the details in our talks as we'd like, but the details are available for you. If, after you have understood what a feature is for and how it works, you still see a problem that we have overlooked (as has happened a lot over the years; part of the reason it's been years) then we'd really welcome your ideas about what to do about it, too.
igodard | 9 years ago | on: Mill Computing in 2017
igodard | 9 years ago | on: Mill Computing in 2017
igodard | 9 years ago | on: Mill Computing in 2017
Security in architecture is the history of a race to the bottom, driven by newbie customers not knowing there was such a thing and the economics of chip-making. We may hope that there are fewer newbies now. That leaves economics. To a large extent the Mill has been an effort to make old ideas economically viable to today's customers.