chisophugis's comments

chisophugis | 9 years ago | on: LLD is included in the upcoming LLVM 4.0 release

> But maybe, they require you to create special versions of object files where even references internal to each library are referenced there as if they live in a different object file? Is that even possible?

The extra information that is needed for an ELF linker (any ELF linker; nothing LLD specific) to operate on functions and global data objects in a fine-grained manner is enabled by -ffunction-sections/-fdata-sections.

For more information, see https://news.ycombinator.com/item?id=13673589

chisophugis | 9 years ago | on: LLD is included in the upcoming LLVM 4.0 release

> Not sure exactly what you mean by this. If you give up determinism, it can be O(changes) - except for time spent statting the input files which, at least in theory, should be possible to avoid by getting the info from the build system somehow. I can understand if LLD doesn't want to trade off determinism, but I personally think it should :)

Not quite. For example, a change in the symbols in a single object file can cause different archive members to be fetched for archives later on the command line. A link can be constructed where that would be O(all inputs) changes due to a change in a single file.

Even though a practical link won't hit that pathological case, you still have to do the appropriate checking to ensure that it doesn't happen, which is an annoying transitive-closure/reachability type problem. ( If you need a refresher on archive semantics see the description here: http://llvm.org/devmtg/2016-03/Presentations/EuroLLVM%202016... Even with the ELF LLD using the windows link.exe archive semantics (which are in practice compatible with traditional unix archive semantics), the problem still remains. )

In practice, with the current archive semantics, any change to symbol resolution would likely be best served by bailing out from an incremental link in order to ensure correct output.

Note: some common things that one does during development actually do change the symbol table. E.g. printf debugging is going to add calls to printf where there were none. (and I think "better printf debugging" is one of the main use cases for faster link times). Or if you use C++ streams, then while printf-debugging you may have had `output_stream << "foo: " << foo << "\n"` where `foo` is a string, but then if you change to also output `bar` which is an int, you're still changing the symbol table of the object file (due to different overloads).

> Interesting. What is this mode? How does it work if it's not incremental and it doesn't read the symbols at all?

Compared to the default, mostly it just skips string merging, which is what the linker spends most of its time on otherwise for typical debug links (debug info contains tons of identical strings; e.g. file names of common headers). [1]

To clarify, there are two separate things:

- the fastest mode, which is mostly about skipping string merging. It's just like the default linking mode, it just skips some optional stuff that is expensive.

- the part of the linker profile that the linker spends most of its time doing in its fastest mode (memcpy + relocate); for example, I've measured this as 60% of the profile. This happens after symbol resolution and some preprocessing of the relocations.

Sorry for any confusion.

[1] The linker has "-O<n>" flags (totally different from the "-O<n>" family of flags passed to the compiler). Basically higher -O numbers (from -O0 to -O3 just like the compile, confusingly) cause the linker to do more "fancy stuff" like string deduplication, string tail merging, and identical code folding. Mostly these things just reduce binary size somewhat at a fairly significant link time cost vs "just spit out a working binary".

chisophugis | 9 years ago | on: LLD is included in the upcoming LLVM 4.0 release

I agree. It's definitely possible. It's just that the actual benefit is far from reducing link time to "O(changes in the input)" and it would introduce significant complexity into the linker (and keeping LLD simple and easy to follow is a high priority). It's definitely an open research area.

> That said, it's definitely possible to get some speedup from incrementality while keeping the output deterministic; you'd have to move symbols around, which of course requires relocating everything that points to them, but (with the help of a cache file that stores where the relocations ended up in the output binary) this could probably be performed significantly more quickly than re-reading all the .o files and doing name lookups. But admittedly this would significantly reduce the benefit.

Yeah. It's not clear if that would be better in practice than a conservative padding scheme + a patching-based approach.

"move symbols around, which of course requires relocating everything that points to them" sounds a lot like what the linker already spends most of its time doing (in its fastest mode).

In its fastest mode, LLD actually spends most of its time memcpy'ing into the output file and applying relocations. This happens after symbol resolution and does not touch the input .o files except to read the data being copied into the output file. The information needed for applying the relocations is read with a bare minimum of pointer chasing (only 2 serially dependent cache misses last I looked) and does not do any hash table lookup into the symbol table nor does it look at any symbol name string.

chisophugis | 9 years ago | on: LLD is included in the upcoming LLVM 4.0 release

> While conventional linkers work at the compilation-unit level (one source file, usually), placing that whole source file's functions adjacently in memory [1], an atom-based linker is able to take the smallest linkable units (individual functions, each static/global variable...), and arrange those optimally.

This isn't quite right. It's just that traditionally (i.e. when not using -ffunction-sections/-fdata-sections) when compilers output an ELF relocatable object, they group all the functions into a single "smallest linkable unit" (a single .text section) and so the linker can't actually do any reordering because the information that the functions are distinct has been lost.

I describe this in more detail here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/10814... This was actually one of the key technical issues that prevented the Atom design from being used for ELF and COFF.

""" the ELF and COFF notion of "section" is a strict superset of its [the Atom LLD's] core "atom" abstraction (an indivisible chunk of data with a 1:1 correspondence with a symbol name). Therefore the entire design was impossible to use for those formats. In ELF and COFF, a section is decoupled from symbols, and there can be arbitrarily many symbols pointing to arbitrary parts of a section (whereas in MachO, "section" means something different; the analogous concept to an ELF/COFF section is basically "the thing that you get when you split a MachO section on symbol boundaries, assuming that the compiler/assembler hasn't relaxed relocations across those boundaries" which is not as powerful as ELF/COFF's notion). This resulted in severe contortions that ultimately made it untenable to keep working in that codebase. """

chisophugis | 9 years ago | on: LLD is included in the upcoming LLVM 4.0 release

Every incremental linking technique I'm aware of involves overwriting the output file and does not guarantee that identical input files and command line lead to identical (bit-exact) output files.

Incremental linking is not so easy under that constraint, since the output depends on the previous output file (which may not even be there).

(and considering the previous output file to be an "input file" follows the letter of the requirement but not the spirit; the idea is that the program invocation is a "pure function" of the inputs, which enables caching and eliminates a source of unpredictable behavior)

We have had to reject certain parallelization strategies in LLD as well because even though the result would always be a semantically identical executable, it would not be bit-identical. See e.g. the discussions surrounding parallel string merging: https://reviews.llvm.org/D27146 <-- fastest technique, but non-deterministic output https://reviews.llvm.org/D27152 <-- slower but deterministic technique https://reviews.llvm.org/D27155 <-- really cool technique that relies on a linearly probed hash table (and sorting just runs of full buckets instead of the entire array) to guarantee deterministic output despite concurrent hash table insertion.

chisophugis | 9 years ago | on: LLD is included in the upcoming LLVM 4.0 release

No, it is just that nobody is currently working on it. Last I talked with the Apple folks they are just busy with other stuff.

Patches are definitely welcome for MachO improvements in LLD (as always in LLVM!). You should be aware though that the Apple folks feel strongly that the original "atom" linker design is the one that they want to use. If you want to start a MachO linker based on the COFF/ELF design (which has been very successful) you will want to ping the llvm-dev mailing list first to talk with the Apple folks (CC Lang Hames and Jim Grosbach).

chisophugis | 11 years ago | on: Popular Myths about C++, Part 3

What Bjarne doesn't mention is the enormous difference in code size between qsort and std::sort. The flexibility of having the compiler generate a sorting routine from std::sort is convenient but enormously redundant in many cases. In LLVM, we have array_pod_sort which is just a thin wrapper around qsort in order to avoid the code bloat of std::sort: http://llvm.org/docs/doxygen/html/namespacellvm.html#ae5788f...

For example, the following generates about 2KB of instructions (and will for basically every new type you want to sort):

#include <algorithm>

struct SomeStruct { int X; };

void foo(SomeStruct *SS, int NSS) { std::sort(SS, SS + NSS, [](SomeStruct LHS, SomeStruct RHS) { return LHS.X > RHS.X; }); }

A qsort equivalent will only emit code for the comparator which is just a handful of instructions.

C++ templates may be type safe and all, but at the end of the day they spew duplicated code just as much as those header-only macro-based C containers and algorithms; really more because it's less painful to write templates (vs. macros) and so you do it more, and there is more stuff in the templates. So even though in general the specialized generated code might be faster in most cases (as Bjarne likes to tout), the overall hit on your code size (and i-cache) can be dreadful. Currently, avoiding this issue in C++ just requires diligence on the part of the coder (some optimizations like LLVM's mergefunc can help, but in general it is a pretty hard problem and compilers are not Sufficiently Smart (TM) yet).

chisophugis | 11 years ago | on: The Art of Insight in Science and Engineering

I spent about a week going through Mahajan's OCW course "the art of approximation" (it's quite short and reads quite quickly). It changed how I view problem solving and engineering estimation forever.

Grab "entire book" (about 130 pages) at http://ocw.mit.edu/courses/electrical-engineering-and-comput...

"The Art of Insight in Science and Engineering" looks like a refined and expanded version of the OCW course. Like 6.055J, I'm sure it can be read casually and in small pieces. Do yourself the favor of checking out 6.055J or "The Art of Insight in Science and Engineering" (although see the caveat below).

Every chapter of 6.055J was mind-blowing for me. The "tree" technique. Using dimensionless constants. The random walk model of errors (i.e. why approximations end up being pretty good usually). Easy cases. Etc. A lot of it is just becoming aware of techniques we already use subconsciously or awkwardly; hence we learn to systematically and effectively use them.

One small caveat: Some of the content unfortunately doesn't extend super well into software/programming. Many of the useful properties that enable reasoning about physical systems (units, conservation laws, symmetry, equilibrium points/linearization, continuity, etc.) are not applicable to software in general since "anything is possible" in software. The software is basically in its own universe that has its own laws completely determined by the hardware/VM/language/API designers. It communicates with the "real universe" through very narrow and controlled channels. Some things from the book are applicable to software, but presented in a way that is strange or not terribly useful for software (e.g. "proportional reasoning" is just the familiar big-O reasoning).

chisophugis | 11 years ago | on: Skunk Works Reveals Compact Fusion Reactor Details

FWIW, a single Boeing 747 engine does about 100MW. Such an engine can be considered to "fit on a truck" for some definition of truck.

Also consider that much of the size is due to the fan on the front (which is not part of the actual engine power plant), which makes the engine's area in the plane transverse to the axis of rotation seem larger.

However, jet engines have the advantage of burning their fuel directly in the air that is flowing through the engine, which results in an extremely rapid transfer of heat to the air compared to other (this is why 10 Boeing 747 engines == 1 GW coal/nuclear plant which is "buildings" in size).

See slide 25 (page 13) of http://ronney.usc.edu/AME436S13/Lecture1files/AME436-S13-lec...

chisophugis | 11 years ago | on: Skunk Works Reveals Compact Fusion Reactor Details

Ballpark calculation:

400km/h ~ 100m/s

air intake cross section ~ 1m^2

==> ~ 100m^3/s of air going through radiator

density of air ~ 1kg/m^3

==> ~ 100kg/s of air going through radiator

specific heat of air ~ 1kJ/(kg * degC)

==> 100kJ/(s * degC) == 100kW/degC

temperature difference between air entering and leaving the radiator ~ 10degC

==> 1MW

So handling 2.5MW waste heat doesn't seem out of the question from this analysis.

As another ballpark "upper bound" analysis, consider that copper is one of the most thermally conductive materials, with thermal conductivity ~ 400 Wm/(m^2 degC), let's round that up to 1kWm/(m^2degC).

Let's assume that there is a thermal conducting surface of 1m^2 (i.e. the surfaces of the pipes that interface with the radiators). Let's assume that the copper is 1mm thick (= 1m/1000). Let's see what temperature difference would be needed to transfer 1MW of heat across:

1MW == 1kWm/(m^2degC) * 1m^2 * (1000/1m) * deltaT degC

==> deltaT ~ 1 degC

which again doesn't seem out of the question.

As another ballpark "upper bound", the convective heat transfer coefficient for forced air is ~ 100W/(m^2degC), which means that assuming that the radiator fins are 100 degC above the air temperature, in order for the fins to transfer 1MW of power to the air, you would need an area of

1MW == 100W/(m^2degC) * 100 degC * area m^2

==> area ~ 100m^2 of surface area in the radiator, which doesn't seem unreasonable (a radiator 1m^2 area * 10cm deep, made up of thin plates spaced 1mm apart has this total surface area).

So they Veyron dissipating ~ 1MW in waste heat at 400km/h doesn't fail any of these basic sanity tests.

More complicated though is the interaction between the stages considered here. For example, how do we interface our 1m^2 of copper with our 100m^2 of radiator? Making the radiator fins thin lets us pack more surface area into the same volume, but also makes it difficult to keep the "edges" of the fins at a sufficiently high temperature so that they pull their weight transferring heat to the air: since the "copper tubing" has significantly less surface area, it only contacts (and thus transfers heat to) the radiator fins "sparsely".

chisophugis | 11 years ago | on: Emscripten and asm.js: C++'s role in the modern web

Curiously, pNaCl also takes that same route.

chisophugis | 11 years ago | on: Roadmap to Alpha Centauri

It's about more than just shutting down though. Biological complexity makes it infeasible to significantly engineer our biological bodies to e.g. make them smaller to reduce fuel costs, or make backups so that we can easily do potentially dangerous experiments involving ourselves (also reproducible experiments).

My prediction: humans will have to be 100% human technology to free ourselves from the Earth.

chisophugis | 11 years ago | on: Roadmap to Alpha Centauri

The biggest barrier to space travel is our biology. We must create non-biological humans that can shut off and wait for the entire trip.

chisophugis | 11 years ago | on: Wolfram Language Introduction for Programmers

Here's a couple elementary examples that are useful:

f[x_Integer]:=x^2

Basically the pattern for the argument of f must match the pattern _Integer, which means that the argument is an integer. If the pattern does not match, then it stays in an unevaluated form; you could have it throw an error by having an extra pattern f[_]:=Assert[False] or whatever.

f[x_] /; x>2 := x^2

Same, but in this case only if x>2.

Basic symbolic manipulation:

Sin[x^2 + x + 2] /. x->3

Evaluate at x=3.

θ^2/r /. {r -> Sqrt[x^2 + y^2], θ -> ArcTan[x, y]}

Convert from polar coordinates to Cartesian.

General programming:

In[21]:= Cases[{1,2,3,4}, x_ /; x>2 && PrimeQ[x] -> x^2] Out[21]= {9}

Kind of like a list comprehension.

chisophugis | 12 years ago | on: SymPy – simplify

Appears to need a bit more robustification.

    >>> simplify(sqrt(x^2))
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/base/data/home/apps/s~sympy-live-hrd/43.373169527249054993/sympy/sympy/functions/elementary/miscellaneous.py", line 110, in sqrt
        return C.Pow(arg, S.Half)
      File "/base/data/home/apps/s~sympy-live-hrd/43.373169527249054993/sympy/sympy/core/cache.py", line 93, in wrapper
        r = func(*args, **kw_args)
      File "/base/data/home/apps/s~sympy-live-hrd/43.373169527249054993/sympy/sympy/core/power.py", line 119, in __new__
        obj = b._eval_power(e)
    AttributeError: 'Not' object has no attribute '_eval_power'

chisophugis | 12 years ago | on: Machine that Levitates Objects Using Sound [video]

mikeselectricstuff did a hack playing around with this concept <http://youtu.be/qy1w6rTpC2g>.

Not nearly as sophisticated as the linked post, but still pretty neat (especially seeing how he messes around with it).

chisophugis | 15 years ago | on: Show hn: Photo based captcha

reCAPTCHA is bar none the way to go. Not only is is robust, but it also serves a useful purpose---it uses user answers to the reCAPTCHA to digitize books by having users recognize text from scanned books where OCR would perform poorly.

http://www.google.com/recaptcha

If your captcha doesn't serve a useful purpose besides being a captcha, it's going to be hard to beat reCAPTCHA.

chisophugis | 15 years ago | on: Ask HN: Review our startup's landing page

FWIW, where it says "Already featuring _several hundred_ items, new content will be added regularly.", the underlining gave me an instinctive urge to click on it, expecting some sort of impressive showcase of the items.

Also, the graphic at the top seems a tad blurry, especially the words. It would be a lot more powerful if the edges and lines were crisper.

chisophugis | 15 years ago | on: Freud explains why there aren't more women entrepreneurs

It ignores it only as much as the average person does.

chisophugis | 15 years ago | on: Freud explains why there aren't more women entrepreneurs

>Why must it be attractive for a female to be powerful in order for her to succeed?

There is an undeniable biological tendency to want to attract a partner. Hence, any human's behavior will tend (in varying amounts) toward striving to be attractive to a partner. This foments a tendency for men to seek "Success, wealth and power" while fomenting a tendency for women to seek "beauty and femininity" (to use the article's wording).

page 1