igodard's comments

igodard | 9 years ago | on: Mill Computing in 2017

Those issues are pretty easy.

For spilling: if a desired argument is less than one belt-length away then we directly reference it; between one and two away we reorder; and more than two we spill.

We place a load as soon as the address arguments are statically available. The compiler doesn't have to deal with aliasing, which is a large and bug-ridden part of compiling for other targets.

Lazy-error (aka NaR) means that there are no error dependencies in arithmetic expressions. Current compilers for other targets simply ignore such dependencies, relying on the "undefined behavior" rule. Mill is designed to not produce nasal demons :-)

igodard | 9 years ago | on: Mill Computing in 2017

Optimal register coloring is NP hard, but no compiler does that; heuristics are no worse than quadratic and give near-optimal.

The Mill specializer part that schedules ops is linear, while the part that assigns belt numbers and inserts spills is NlogN in the number of ops in an EBB because it does some sorts.

igodard | 9 years ago | on: Mill Computing in 2017

Yes - by those with established businesses. I can't think of a startup that succeeded with an initial incremental, at lease since Amdahl. It's also hard to do an increment: the Intel teams are not dumb, and many could and have built far better processors than Intel has - but not while keeping compatibility and Intel's ROI and marketing and pricing structure. TransMeta tried - RIP.

There's also personal strategy involved. If we had done a better X86 we would have needed huge dollops of money to crack the front door of the market - c.f. TransMeta - and lost ownership of the company. By going for the disruptive approach we still own all of it - and funding rounds now are at a valuation that will keep us making our own mistakes, not someone else's. That matters to me, enough to go without paycheck for a decade. YMMV.

igodard | 9 years ago | on: Mill Computing in 2017

Judge us on the tech, not on me. To be honest, if you cannot already understand what we have put out to the public well enough to know that you want to work on it, then you are probably still too junior to be really useful as we are now. We can't afford the cost of ramping people up if they are not already there.

As we grow there will be more place for beginners, but not yet. Mind, that's beginners as in concept understanding, not beginners as in age or degrees. I'm a dropout, and this year we added an intern in Tunisia who's still finishing his exams. We let people self-select, we don't try to persuade them. And we don't pay them, so only the convinced join.

In a way, at the top of engineering things work much more like they do in the arts: you are judged by your portfolio, not by your background or education.

igodard | 9 years ago | on: Mill Computing in 2017

See https://millcomputing.com/#JoinUs

NDA and sweat-equity agreement required; you get full-vested long-term options monthly; expected to be actual stock monthly after the next round.

To help in a FOSS context: we are not yet ready to put out an SDK to the FOSS community, or to anyone really. That will wait until after our cloud environment is up - and a few more patents have been filed (the ISE exposes things we want to protect).

In the meantime, the most help would be to support existing microkernel operating system efforts such as L4 (https://en.wikipedia.org/wiki/L4_microkernel_family). The Mill will have a big impact on conventional OSs like Linux, but those suffer from built-in assumptions that open security holes and makes the OS a dog that the Mill can only train a little. The big win is in microkernels, which can take advantage of the Mill's tight security and ultra-fast context switching.

Or support languages that have central micro-thread concepts, such as Go https://en.wikipedia.org/wiki/Go_(programming_language), for the same reason.

But whatever you do, don't tune the microkernels or microthreads for a conventional core; instead, do it right, and we'll be along to help eventually :-)

igodard | 9 years ago | on: Mill Computing in 2017

All but #4. Our business model is to sell chips, not licenses. Why? Intel's quarterly dividend is bigger than ARM's annual revenue.

Licensing is a backup plan.

igodard | 9 years ago | on: Mill Computing in 2017

Our industry is a surprisingly small pond, at the top anyway. No one would invest in the Mill based on my formal bio "College dropout; never took a CS course in his life" :-)

Instead they invest based on our technology, most of which (and eventually all) we make publicly available. You may not be able to judge it, but any potential partner has people who can. One of the things that encouraged us in our long road is that the more senior and more skilled the reviewer, the more they loved the Mill. Quotes like "This thing is the screwiest thing I've ever seen, but it would work - and I could build it".

igodard | 9 years ago | on: Mill Computing in 2017

Both. Both full- and part-timers are on sweat-equity, no cash, although we will be switching to cash-optional with the next funding round.

igodard | 9 years ago | on: Mill Computing in 2017

GenAsm for the factorial function (conAsm in a different reply, above): define external function @fact w (in w) locals($3, %9, &0, ^6) {

label $0:

    %1 = sub(%0, 1) ^0;

    %2 = gtrsb(%1, 1) ^1;

    br(%2, $1, $2);
label $1 dominators($0) predecessors($0, $1): // loop=$1 header

    %3 = phi(w, %5, $1, %0, $0);

    %4 = phi(w, %6, $1, %1, $0);

    %5 = mul(%3, %4) ^2;

    %6 = sub(%4, 1) ^3;

    %7 = gtrsb(%6, 1) ^4;

    br(%7, $1, $2);
label $2 dominators($0) predecessors($0, $1):

    %8 = phi(w, %0, $0, %5, $1);

    retn(%8) ^5;
};

Higher-end Mill models have more and bigger things to encode, and so a given program will occupy more bytes than it will on a lower-end member. Thus a belt reference on a Gold is 6 bits, but only 3 on a Tin.

igodard | 9 years ago | on: Mill Computing in 2017

ConAsm (model-dependent assembler) for "Silver" model:

F("fact") %0; sub(b0 %0, 1) %1;

        gtrsb(b0 %1, 1) %2,
          retnfl(b2 %0),
          inner("fact$1_1", b1 %1, b2 %0);
L("fact$1_1") %3 %4;

        mul(b1 %3, b0 %4) %5;

        sub(b0 %4, 1) %6;

        gtrsb(b0 %6, 1) %7,
          retnfl(b1 %5),
          br("fact$1_1", b2 %6, b1 %5);
Note that integer multiply is specified as 3 cycles in this model, so the recurrence in the loop takes three cycles. Other ops are one cycle. Each semicolon is an instruction (and cycle) boundary, ops separated by commas issue and execute together.

igodard | 9 years ago | on: Mill Computing in 2017

Perhaps you are thinking that the FPGA is a product? It's not; it's an RTL validator. Moving chip RTL from FPGA to product silicon is a well understood step that is almost routine in the industry. Time was you would do initial RTL development work in silicon, but modern FPGAs are big enough to hold a whole CPU core. Today you wouldn't develop directly on the silicon without an FPGA step, even if you own a fab; it's just too expensive to debug.

igodard | 9 years ago | on: Mill Computing in 2017

Evolutionary development you can start at once, because you are building on what you had before; think an x86 generation. Evolution works if you already dominate a market and only need to run a little faster than your competitor. Evolution can be scheduled; tick-tock.

A newcomer can't sell yet another me-too, even with evolutionary improvement. Instead the newcomer has to rethink and create from first principles if it is to have any chance in the market. Rethinks can't be scheduled; they take as long as they take. Ours has taken longer than I'd hoped, but adding resources like more people to the project would have just slowed us down, or forced us to market with a broken product.

The Mill rethink stage is over, and we now can have reasonable schedules and put in more resources; that's why we are going out for a significant funding round this year, our first over $10M.

igodard | 9 years ago | on: Mill Computing in 2017

Exactly. As a new entrant, it is impossible for us to immediately enter the mass markets dominated by the majors. Consequently we adopted a strategy of targeting an increasing set of niche markets that have been poorly server by the majors and their products, and then going after larger markets as we grow in resources.

This market strategy dictates the ability to produce many specialized products with small sales for each. That can't be done in the million-monkeys development approach used by the majors, which is why the majors neglect these markets. So we adopted a specification-based design strategy.

In turn, the design strategy dictates the development strategy: first the specification tools; then the assembler and simulator so we could try out various designs in small test cases and improve the architecture; then the tool chain so we could measure real quantities of code and confirm the ISA and macro-architecture. Then, and only then, write what manual RTL is left that the generators working from the specifications can't handle. The combined RTL must be verified, and it is much easier and cheaper to do that in an FPGA than with fab turns. As the message says, the FPGA is next.

Lastly, we will pick a particular market and specify the ideal product for it, run the spec through the process we have been so long building, and the fab gives us something in a flat pack.

Which won't work, of course. The first time, anyway :-)

igodard | 10 years ago | on: LLVM Meets the Truly Alien: Mill CPU Architecture [video]

(Mill team) The streams are not streams of instruction; they are streams of half-instructions. The Mill is a wide-issue machine, like a VLIW or EPIC; each instruction can contain many operations, which all issue together. Each instruction is split roughly in half, with some of the ops in one half and some in the other. On the Mill, the split is based on the kind of operation and the info that ikt needs to encode: all memory and control flow ops on one side, all arithmetic on the other, although other divisions are possible.

Each half is grouped with the same half of other instructions to make a stream, and the two streams decode together, step by step, so each cycle one instruction, comprising both halves, decodes and issues.

The result is to double the available top level instruction cache, and cut in half the work that has to be handled by each of the two decoders.

igodard | 11 years ago | on: Is the Mill CPU real? (2014)

Some responses to comments here:

We do not have an FPGA implementation, although we are working on it.

The reason getting to product is slow is that we are by choice a bootstrap startup, with three full-timers and a dozen or so part-timers. Compare a CPU project at a major such as Intel or ARM, with hundreds of full-time engineers, big budgets, and five years of development before the product is even announced - all for minor modifications to an established and well understood design.

The Mill architecture is notable primarily in what it takes out of the CPU, rather than what it puts in.

Patents are a backbreaking amount of work. To economize in filing costs our attorneys have consolidated what we expected to be separate patents into single filings - no less work, just fewer/bigger and hence cheaper. So far the twenty filings that have gone in represent around two thirds of the core technology. And if you think that 80-odd pages of patentese is fun then you are more kinky than should be let out in public.

"Architecture astronaut" is a cute term, not restricted to architecture; I have more than enough experience with those full of ideas - for someone else to do. We have done a lot of architecture over the decade, but a fair amount of work too. For a live demo the result of some of our work see http://millcomputing.com/docs/specification/

While I have worked in academia, I am more comfortable in an environment making real things for real people (or trying to) rather than sheepskins.

The purpose of our talks is to get professional validation of the design in a way we can afford. It is not to sell you something; it will be quite a while before we have something to sell, and you are not our customer.

We have a private placement in process and welcome Reg-D qualified investors. If you do not know what that is then you probably aren't. For details: http://millcomputing.com/investor-list. We are not cruising Sand Hill Road. We estimate ~$6M to an FPGA, and $25M to a product. Heavy semi is serious industry.

In early days I did my first compiler for a Bob Barton-designed machine, the Burroughs B6500. That compiler is still in use forty five years later. I wish Barton were still here today; the Mill owes a great deal to his design philosophy.

Ivan

igodard | 11 years ago | on: Is the Mill CPU real? (2014)

The original of this discussion was a blog post on kevmod.com. I posted the following comment on that blog, repeated here verbatim as possibly of general interest: ++++++++++++++++++++++++++++++++++++++++++++++++++++ Your skepticism is completely justified. The Mill may never reach market – we are a startup, and most startups fail; its a fact of life. Although we’ve survived for over a decade, which is pretty good for startups these days.

But it sounds like you are less skeptical about Mill Computing the company, but more about Mill the technology and architecture. There are fewer ground to doubt that. As fast as we have been able to get the patents filed (I seem to have been doing nothing else for the last two years. I hate patents) we have been completely opening the kimono and showing the technical community, in detail, how each part works. Why? because we wanted outside validation before wasting another decade in something that was fatally flawed in some way we had overlooked.

If there was any part of the public Mill that one could point at and say “See? that won’t work, because …” then the web would have been all over us. Buy you know? Skepticism we get, aplenty. What we don’t get is informed skepticism. In fact, the more senior and skilled the commenter, the more they fall in love with the design. Like Andy Glew said one time (and if you don’t know who that is then you are not in the CPU business) – “Yeah, it’ll work, just the way he says it will”.

Sometimes people complain that our presentations are insufficiently detailed to fairly evaluate. Guilty as charged; they are oriented for a high level audience interested in the subject, but not for the specialist. However, if you ask for details on our forum (mill computing.com/forum/themill) or the comp.arch newsgroup, as hundreds have, you will get all the details you want until they flood out your ears and collect in puddles on the floor.

In these days of internet time, when idea to market is measured in days or weeks, it’s east to forget that not all the economy works that way. Building steel mills, cement plants, and yes, CPU silicon takes a long time and a lot of money. We have deliberately swapped money for time: we are a bootstrap startup, not looking for VC funding. There’s good and bad in that choice: a decade without a paycheck is not easy, but today we own it – all of it – and feel we got a good deal.

The proof of the Mill pudding will be when there’s a product with pins on the bottom, and that won’t happen for some years yet. We learned in our first presentation not to make projections of what the eventual chip will have for numbers. Yes, we have guesstimates internally, but we’re quite sure those will be off by a factor of two. The problem is that we have no clue which direction they will be off.

If you have the technical chops to understand a CPU design from first principles then please dig as deep as you can into our stuff and tell us – and the world – what you find. Otherwise you will just have to join us as we wait and work and see. We’ve never said anything different.

Ivan

igodard | 11 years ago | on: Software Pipelining on the Mill CPU [video]

Mill multicore has fully sequentially consistent cache coherency; there are no barrier operations. Sorry, how it's done is still NYF (Not Yet Filed). We expect a talk on the subject this fall.
page 2