(no title)
long | 5 years ago
Using the "+" operator to denote composing parts of visualizations is not the greatest syntax but I think we're basically stuck with it for a bit due to historical baggage. See this note from the creator of ggplot, Hadley Wickham: https://community.rstudio.com/t/why-cant-ggplot2-use/4372/7
melling|5 years ago
lmeyerov|5 years ago
```
df = cudf.read_csv('1GB.csv').drop_duplicates(['user_ip', 'click'])
g1 = graphistry.edges(df, 'user_ip', 'click')
g1.plot()
g2 = g1.encode_point_color('risk', ['blue','yellow','red])
g2.plot()
g2.edges(cudf.read_csv('file2.csv')).plot() # reuse g2's color settings
g1.edges(cudf.read_csv('file2.csv')).plot() # ... or just g1's graph shape
```
Being able to 'fork' plots and interactively swap in different data / encodings is super great over the course of a session. You can always go back to an earlier one as you make progress. Likewise, you can rerun notebook cells and read them top-to-bottom without worrying too much.
So while we're looking at some V2 additions, maybe supporting R, and updating some of the core (more automatic GPU goodness!)... we're definitely keeping the compositional style.
Interesting nit: Libraries copying the original grammar of graphics can likely benefit from friendlier functional DSL presentation styles. As is, I think they make it much harder to read + write, undercutting much of the productivity potential. I love the academic concept of making everything a composable value, but doing naked composition over a massive namespace of diverse types.. is super confusing to read + write.
Learning from pandas & jquery, we ended up instead steering users to chaining for the typical case: `g.bind(...).edges(...).nodes(...).encode(...).plot()`. It's functional so you can always do `g_intermediate = g...` and likewise still do first-class GoG-syntax-style things with them of you really want `f(g._bindings)`. However, those are the minor case, and people doing them make code harder to read + write:
-- Reading GoG code is confusing: In `x + f(y)`, often unclear what x, y, and f(y) are, and more so in dynamic languages like Python + R that they're used in. In `g.bind(..).encode(...).plot(...)`, each composition is pretty obvious in the typical case, and you can always read back or do first-class in the atypical case.
-- GoG plot authoring is jarring: When doing `x + ...`, tab complete doesn't get you far. If tab complete does somehow kick in, you are dealing with a big namespace dump. Instead, I see people turn to google for almost every step! In contrast, table complete on `g.nodes(df)...` will pull up the most likely next settings to add, and then again for the arguments to fill into whatever command you pick.
GoG defaults to those for the typical case, vs atypical one, so a 2nd-class imperative API may be easier. But with chaining, we get functional composition without losing straight-line reading and tab-complete. Best of both worlds!