edparcell's comments

edparcell | 5 years ago | on: Math.min(Math.max(num, min), max)

It is simpler if you use the notation [x]_a^b (i.e. with a subscript and a superscript b) to mean x, clipped to the range a to b, and skip writing +/- infinity if you don't intend clipping on one side.

Then you get a bunch of obvious identities like [x]^b = min(x, b) = [b]^x (x capped by b is the same as the smaller of x and b which is the same as b capped by x), [x]_a^b = [b]_a^x, and [x]_a^b = [[x]_a]^b. Putting these together you get [x]_a^b = [[x]_a]^b = min(max(x, a), b). But honestly it's just easier to stick to the notation most of the time.

A better write-up, for everyone who doesn't like reading new math notations inline: https://imgur.com/gallery/593QEow (Imgur link with white background) https://quicklatex.com/cache3/71/ql_46c49ac709b3789482d0736d... (Original link - renders badly in Chrome due to PNG transparency)

edparcell | 5 years ago | on: Why companies lose their best innovators (2019)

100% agree. People now use “innovate” and “invent” interchangeably. Typically they use the fancier sounding one because they want to impress people with their long words. They are not interchangeable though. Invention is the initial spark to the first version. Innovation is the polishing process of the next n versions. The iPhone 1 is an invention, and every iPhone after that is an innovation.

Now, the iPhone 1 didn’t do very much, and often there is far more value in the innovation than there was in the original invention. But you don’t get the innovation without first inventing something that didn’t previously exist.

Sadly, using words incorrectly swaps into thoughts, and affects reasoning. Because these words have been conflated, organizations are typically no longer able to reason about invention and innovation correctly, and are uninterested in inventing as a result. I would argue we see this in the lack of new underlying technological inventions after the 90s. It is like we have eaten our own seed corn. Very sad.

edparcell | 5 years ago | on: The most remarkable legacy system I have seen

Loman author here. Thank you very much for the mention. Amazed that I never heard of Athena or pixie graphs. Our intention with Loman was to create a library scoped for a single process - we looked at the possibility of creating a system responsible for executing much larger graphs on a real-time ongoing basis, but it felt like a larger project than we'd be able to execute well. It sounds like Athena was that, and it worked well, subject to being a culture shock for people coming into it?

edparcell | 5 years ago | on: Create diagrams with code using Graphviz

I'm a big fan of Graphviz. My old team created a library called Loman, which we open-sourced, which uses DAGs to represent calculations. Each node represents a part of the calculation and contains a value, similar to a cell in an spreadsheet, and Loman tracks what is stale as you update inputs. Loman includes built in support for creating diagrams using Graphviz. In our quant research we have found that invaluable when revisiting old code, as it allows you to quickly see the structure and meaning of graphs with hundreds of nodes, containing thousands of lines of code.

We've found it quite useful for quant research, and in production it works nicely because you can serialize entire computation graph which gives an easy way to diagnose what failed and why in hundreds of interdependent computations. It's also useful for real-time displays, where you can bind market and UI inputs to nodes and calculated nodes back to the UI - some things you want to recalculate frequently, whereas some are slow and need to happen infrequently in the background.

[1] Github: https://github.com/janushendersonassetallocation/loman

[2] Docs: https://loman.readthedocs.io/en/latest/

[3] Examples: https://github.com/janushendersonassetallocation/loman/tree/...

edparcell | 6 years ago | on: Metaflow, Netflix's Python framework for data science, is now open source

My team has a similar library called Loman, which we open-sourced. Instead of nodes representing tasks, they represent data, and the library keeps track of which nodes are up-to-date or stale as you provide new inputs or change how nodes are computed. Each node is either an input node with a provided value, or a computed node with a function to calculate its value. Think of it as a grown-up Excel calculation tree. We've found it quite useful for quant research, and in production it works nicely because you can serialize entire computation graph which gives an easy way to diagnose what failed and why in hundreds of interdependent computations. It's also useful for real-time displays, where you can bind market and UI inputs to nodes and calculated nodes back to the UI - some things you want to recalculate frequently, whereas some are slow and need to happen infrequently in the background.

[1] Github: https://github.com/janushendersonassetallocation/loman

[2] Docs: https://loman.readthedocs.io/en/latest/

[3] Examples: https://github.com/janushendersonassetallocation/loman/tree/...

edparcell | 7 years ago | on: Lectures in Quantitative Economics as Python and Julia Notebooks

The ease of the calculation tree in Excel versus having to keep track of what cells in a notebook you have updated was a large part of why we built and open-sourced Loman [1]. It's a computation graph that keeps track of state as you update data or computation functions for nodes. It also ends up being useful for real-time interfaces, where you can just drop what you need at the top of a computation graph and recalculate what needs updating, and also for batch processes where you can serialize the entire graph for easy debugging of failures (there are always eventually failures). We also put together some examples relevant to finance [2]

[1] https://loman.readthedocs.io/en/latest/user/quickstart.html

[2] https://github.com/janushendersonassetallocation/loman/tree/...

edparcell | 7 years ago | on: Why Jupyter is data scientists’ computational notebook of choice

I had the same trouble with order dependence as notebooks got to a certain size, so my team and I created and open-sourced a library, Loman, to help with that. It allows you to interactively create a graph, where nodes represent inputs or functions, and then keeps track of state as you change or add inputs, intermediate functions and request recalculations. Our experience has been broadly positive with this way of working. As graphs get larger, it's easy to lift them into code files in libraries, while continuing to modify or extend them in notebooks. The graph structure and visualization make it easy to return to loman graphs with up to low hundreds of nodes, which would make for a fearsome notebook otherwise. It also makes it easy to bolt Qt or Bokeh UIs onto them for interactive dashboards - just bind UI widgets and events to the inputs and widgets to the outputs. They can be serialized, which is useful for tracking exceptions in intermediate calculations when we put them in airflow to run periodically, as you can see all the inputs to the failing calculation, and its upstreams.

[1] Github: https://github.com/janushendersonassetallocation/loman [2] Quickstart/Docs: https://loman.readthedocs.io/en/latest/user/quickstart.html

edparcell | 8 years ago | on: Esoteric programming paradigms

Thanks for the links. I took a look, and I think that the intention is quite different between the libraries. Our library would not directly apply to the Dining Philosophers Problem. Both libraries use graphs to represent dependencies between tasks, but they do so for different reasons, and to cover different uses. The Intel library does it with the intention of scheduling a given workload. Our library uses a directed acyclic graph to track state as either the data or function for given nodes of the graph are exogenously updated, either interactively during research, or from new incoming data in a real-time system. We cover where we think our library is useful in more depth in the introduction section of our documentation[1].

[1] http://loman.readthedocs.io/en/latest/user/intro.html

edparcell | 15 years ago | on: Ask HN: Please review my idea and holding page

Hi Jon,

Interesting idea. Seems like it'd be a good novelty gift for my family to get me for example.

I guess you are already thinking this way, but it seems fairly natural to offer a birthday card, and maybe a range of other geek products around this.

On the product itself, it might be good to do alphabetical sudokus also - for 16x16 sudokus this could lead to some interesting message possibilities perhaps? Also, are there any other puzzles that lend themselves to this sort of customization - wordsearch perhaps?

I guess those are the two ways I'd consider expanding on an appealing starting idea.

Good luck, and let us know when you launch.

Best, Ed.

edparcell | 15 years ago | on: Why CPUs Aren't Getting Any Faster

I think that one approach that may yield domain-specific improvements would be to add certain numerical routines into the x86 instruction set.

When I was working in finance as a quant, I was shocked by the amount of time code spent executing the exponential function - it is used heavily in discount curves and similar which are the building blocks of much of financial mathematics. An efficient silicon implementation would have yielded a great improvement in speed.

edparcell | 15 years ago | on: Computers are fast

No need to brute-force. It's pretty easy to just eliminate possibilities:

The largest number must be larger than 7, else the largest possible sum of the three numbers is 21.

If the largest number is 8, then the other two must sum to 15, which is not possible since they must both be 7 or less.

So the largest number is 9, and the other two numbers sum to 14. The possibilities for those other two, with the smallest first are (5 and 9), (6 and 8), or (7 and 7). Clearly (5 and 9) would duplicate the original 9 (5+9+9), and (7 and 7) would duplicate the 7.

So the only possible solution is 6+8+9.

edparcell | 15 years ago | on: Please review my API for HackerNews

How timely. I just started writing a library to scrape data from Hacker News because I wanted to put the posts I'd upvoted in the sidebar of my blog.

Link: http://blog.edparcell.com/how-i-added-my-hacker-news-saved-s...

Your API has advantages and disadvantages against this approach: On the upside, it provides a uniform way for all languages to access content from HN, which is really cool.

On the downside, all requests through your API have to flow through your server - this makes me uneasy for two reasons: First that you could switch off your servers, esp. if take-up is high and you are not being compensated sufficiently for running them. And second, because I'm uncomfortable authenticating to an intermediary.

edparcell | 15 years ago | on: Confirmed: Google Me coming this Fall

I misread that, thinking it was odd that Google would fall to copying Microsoft, and surprising they would copy Windows Me.

Reading about the product left me slightly less impressed.

Facebook already has social covered, and I've pretty much stopped using it. Buzz does not - Google don't get it, and they don't get that they don't get it. If Google Me was an upgrade to a desktop application, I probably wouldn't bother to install it. As it is, I'll look forward to having my cheese moved again.

edparcell | 15 years ago | on: First Impressions of Sitting a Web App on CouchDB

From the article: "CouchDB + Another Layer as Web Server is Redundant". This is very true, and the biggest difference to other databases I've used, NoSQL or not. It ends up being a double-edged sword for CouchDB.

On the one hand, it saves having to context switch between thinking about the database, thinking about the web server, thinking about the web framework, etc.. For a single developer, this is not to be sniffed at. As learning is often done by lone experimentation, it makes CouchDB an excellent system for learning about NoSQL. It's also great for prototyping, where you don't necessarily want to spend a huge amount of time creating an industrial strength back-end.

On the flip-side, it means that if you do run into a situation where you want to use another layer, you've just given up the biggest advantage of CouchDB. For my current project, I found I would need to be able to integrate authentication with an existing Kerberos system - the easiest way to do this was to use Apache.

I also had slight reservations about CouchDB's approach of limiting what you can do - everything has to be a map-reduce - to ensure scalability. The plan as I understand it is to introduce further safe operations down the road. I'm sure that approach will yield dividends, but for now it seems quite limiting - similar to eschewing C in the 70s because it does not have garbage collection.

Didn't mean to beat up on CouchDB too much - it's a fine system, but I prefer the flexibility of MongoDB at this point.

edparcell | 15 years ago | on: Ask HN: Review my Startup - GitMac: Git, made easy

Looks interesting. I develop on Windows, Mac and Linux platforms. My desktop is Windows+Mac, and I'm currently using TortoiseGit on Windows and GitX on Mac, plus the command-line where necessary or more convenient. I haven't done too much Mac development in the last few months, so this is from memory - the situation may have changed since.

I would guess GitX is your main competition. GitX is perfectly friendly, and quite useful for staging and making commits on a single development branch, but doesn't exactly have great coverage of the other features of git that I use, so I end up resorting to command-line relatively often. Good for me I'm sure, but it hampers my workflow a little, especially if I have to start diving through docs when I know what I want to do, but need to look up the arguments to do it.

With that said, I'd welcome a more complete Mac Git GUI, and I look forward to tracking your progress, and evaluating your product when possible. Best of luck.

edparcell | 15 years ago | on: Excel Is The World’s Most Used “Database”

Excel is totally free-form, and for small-scale "databases", it's robust enough (until it isn't). This means the user can do pretty much what they want, without getting a programmer involved. And that's the killer feature, that more advanced or more specific solutions miss.

Let's take an example. If the user wants to stick some free-form text in between the end of sales records for one year and the start of the next, they are free to do that. In any less free-form application, they need to define a "comment record" or similar, and they probably need to get a programmer involved to do that. And although the SaaS web version of their "database" may have a better interface, in a lot of cases, having to get a tech involved to make that sort of change is not a compromise people want to make.

And they have a point. It's not a slight against programmers, it's just that when they need to make that change, you'll be 2 companies and 5 projects down the line, and it won't be possible. The article mentions a sheet that has been in use for 15 years - if that had been made as a proper program, at that time, it likely would have been done as, say, a VB application, with an MDB back-end, and it probably would have had purple buttons. The source code would now be lost, and if the business process changed at all, the choices would be a full re-code, or working around it. I would be surprised if in 15 years time, we don't look at today's pet technologies in the same way that you just did when you read VB and MDB.

For me, the direction that Excel, and other spreadsheet, need to take is the same route that browsers needed to take when IE6 ruled the world. We need standardisation, and innovation. I've written a couple of blogs on this, and for me, the way to go is a central repository of extensions (http://edparcell.posterous.com/how-about-an-app-store-for-ex... for more on that). In the case of Excel "databases", it might be sensible to create an extension to standardise the management and creation of such "databases". It could even allow features like sharing data, backing up etc, but for that to still happen where users are comfortable: within Excel.

edparcell | 15 years ago | on: Poll: What database does your startup use?

I have used both. I implemented a prototype back-end in CouchDB, and it worked well for this purpose, allowing me to get up and running quickly. But I found that I needed a couple of things that they didn't support - for example, I wanted to integrate with an existing authentication system rather than use the built-in CouchDB one. It became clear that I would have to write my own middleware layer in Python or similar. Once I had made that decision, I had lost the simplicity that is CouchDB's most compelling feature and I felt I was missing out in other ways - CouchDB's REST API is less flexible than MongoDB's programmatic API for example. So I switched to MongoDB.

I'd say both gave me exactly what I needed at different times, and so I recommend both projects highly, depending on what you need. CouchDB is an excellent tool to quickly "mock up" a live back end, and a great way to learn hands on with document stores with very little upfront effort required. MongoDB seems to integrate better as part of a larger system, and provides more flexibility.

page 1