It's amazing how we are watching use cases for notebooks and spreadsheets converging. I wonder what the killer feature will be to bring a bigger chunk of the Excel world into a programmatic mindset... Or alternatively, whether we will see notebook UIs embedded in Excel in the future in place of e.g. VBA.
From my experience, I think that this is unlikely. A notebook abstracts a REPL and allows for literate programming. It's mostly used by people who used to write computer programs to carry out analyses (e.g., applied statisticians or numerical mathematicians). A notebook gives them the opportunity to explain the code block, include equations, etc.
A spreadsheet allows for semi-automation of data processing. Each cell can have a (rather simple) function defined and its evaluation result is printed in that cell. You can actually build up pretty complex workflows by just concatenating cell evaluations.
To give a more concrete example, think about a loop. It is arguably the building block of any computer programming language and a necessary cornerstone in learning how to program. Both notebooks and spreadsheets implement loops differently. You can code a loop in a notebook but the cell output will be difficult to interpret (think of it having to fit a linear model for 5 different outcomes). You would be better off just splitting up the cell and run the models separately. That will allow for commenting the code and explaining the results, just like you would in writing a paper. In a spreadsheet, you would define a function, then copy/paste it for the cells you want it evaluated for. No programming required, just knowledge of how to link to cells from within a function and how to copy/paste in the spreadsheet. That's why spreadsheets are wildly used by non-technical people with little knowledge of computer programming.
I've used a lot of both. Excel is much better for most tasks where you have < 1mil rows of data. Its easier to look at the data, easier for novices and fast enough. Just being able to scroll through the data is very valuable just to get a feel for it. The biggest drawback is VBA, if you could write excel macros in Python it would be a hit.
If you have more data, Notebooks can handle that better. However I've noticed lots of colleagues skipping notebooks and using IDEs instead. Much easier to work with and better for scm. I'm not a huge fan of notebooks any more.
As seen in the post, Finance has caught on to this idea. The Bloomberg Terminal now provides BQuant. It's an almost fully functional IPython notebook with built in access to their financial datasets.
Analysts that used to work in excel are moving their models into environments like these. Libraries for most common functionality are provided, and allow someone with only a bit of VBA knowledge to feel comfortable enough to start working with python.
And when you browse places like r/financialcareers, it's filled with finance students wondering which programming languages they should learn. And the answer is always to learn python using jupyter notebooks.
That’s not a bad idea. Spreadsheets are pure functional languages built that use literal spaces instead of namespaces.
Notebooks are cells of logic. You could conceivably change the idea of notebook cells to be an instance of a function that points to raw data and returns raw data.
The ease of the calculation tree in Excel versus having to keep track of what cells in a notebook you have updated was a large part of why we built and open-sourced Loman [1]. It's a computation graph that keeps track of state as you update data or computation functions for nodes. It also ends up being useful for real-time interfaces, where you can just drop what you need at the top of a computation graph and recalculate what needs updating, and also for batch processes where you can serialize the entire graph for easy debugging of failures (there are always eventually failures). We also put together some examples relevant to finance [2]
Hi, present day (grad) student here that has been seeing this change happen gradually over my academic career. honestly there have been times I wish that technology stayed out of education because I feel like that the clear explanations have disappeared in exchange for cool graphics or videos (maybe a budget reallocation to the design/graphics team on the publisher's side?). or a related thing is I've noticed in class discussion happens less as slides have replaced the whiteboard/chalkboard as the class speeds through pre written formulas or texts. overall, perhaps there's a case for more quantity of info being relayed thanks to tech but I feel like quality has suffered as a result.
Does anyone else find it strange that there is no real-world data in these notebooks? It's all simulations or abstract problems.
This gives me the sense, personally, that economists aren't interested in making accurate predictions about the world. Other fields would, I think, test their theories against observations.
It's an educational course in quantitative economical methods. Fitting real world data is messy and would probably distract. There obviously is overlap with metrics but as an undergrad course I'd separate this too. They do have ample links to scientific papers that do use real world data. There's a pages long list of references [1]. Do check them out if you're into economical science.
There are serious critiques of economic theory out there, which tend to say that kind of thing.
But if you compared these notes to the notes for a college level physics course, you would find a similar level of abstraction, idealized models, and absence of real world data. Those things are not in themselves indicators that physicists (or economists) don't care about the real world. In any mature field, there is a body of knowledge and techniques to be learnt. There's a certain formalism to be picked up, rather than just staring at data.
There might be legitimate reasons for dismissing the general approach taken by mainstream economic theory, but what you seem to be saying ("hmmm, my intuition is that this stuff doesn't focus enough on accurately predicting the real world") is not a reasoned critique.
> This gives me the sense, personally, that economists aren't interested in making accurate predictions about the world. Other fields would, I think, test their theories against observations.
You say this as though using mock-up data to teach techniques isn't a universal practice in literally every other discipline.
>Does anyone else find it strange that there is no real-world data in these notebooks? It's all simulations or abstract problems.
Pretty much every course I took in undergrad physics had no real world data. The intro level courses were especially fun, when we'd go into the lab and get such horrible data that we'd never conclude what they're teaching in the theory classes. We wondered what the point of the lab even was.
The biggest offender is the friction model. Heck no - it's not proportional to the normal force. No one could successfully show that in the lab. And a quick Google search shows you a trivial experiment where just changing the orientation and keeping the normal force the same leads to wildly different frictions.
It's an academic course. You learn using models and basic concepts, then eventually apply it to real data.
Ever taken statistics courses? You're not doing multiple regression analysis on real world data on day 1. On day 1 you're learning odds using playing cards and coin flips.
You could test your own theory against observations that calculations with real world data are very much a part of economics, but are just not part of this particular course.
Of course, it depends on who they work for. Effectively, the American field of economics is an exercise in decoupling private reality from public theory.
I'm econ undergrad -> DS -> Machine learning. Econ is very useful for data science if you focus on the right subjects: statistics, math, and experimental design. You get all the hard skills you need to interact with data that a statistician or computer scientist gets, with the (significant, unique) benefit of learning how to ask the right question or design the right experiment given what is likely a messy, weird, social scientific question.
On the other hand, if you don't do any quantitative, empirical, or experimental economics -- i.e. you only do theory or political econ -- then you won't pick up these skills (as much).
Probability theory, optimization, statistics and so forth do not differ between economics and computer science, so it makes sense they are the same.
You would see a difference in that these sort of models are used for causal inference and counterfactual analysis, whereas Machine Learning is mostly predictive.
That being said, Machine Learning is starting to apply methods developed in econometrics and/or stats, like GMM and Time Series methods.
For example, Long-Term Memory models are quite recent additions to Machine Learning. The short-memory process restriction of autoregressive models has been worked on since the early 80's.
[+] [-] evrydayhustling|7 years ago|reply
[+] [-] pacbard|7 years ago|reply
A spreadsheet allows for semi-automation of data processing. Each cell can have a (rather simple) function defined and its evaluation result is printed in that cell. You can actually build up pretty complex workflows by just concatenating cell evaluations.
To give a more concrete example, think about a loop. It is arguably the building block of any computer programming language and a necessary cornerstone in learning how to program. Both notebooks and spreadsheets implement loops differently. You can code a loop in a notebook but the cell output will be difficult to interpret (think of it having to fit a linear model for 5 different outcomes). You would be better off just splitting up the cell and run the models separately. That will allow for commenting the code and explaining the results, just like you would in writing a paper. In a spreadsheet, you would define a function, then copy/paste it for the cells you want it evaluated for. No programming required, just knowledge of how to link to cells from within a function and how to copy/paste in the spreadsheet. That's why spreadsheets are wildly used by non-technical people with little knowledge of computer programming.
[+] [-] rb808|7 years ago|reply
If you have more data, Notebooks can handle that better. However I've noticed lots of colleagues skipping notebooks and using IDEs instead. Much easier to work with and better for scm. I'm not a huge fan of notebooks any more.
[+] [-] rbavocadotree|7 years ago|reply
Analysts that used to work in excel are moving their models into environments like these. Libraries for most common functionality are provided, and allow someone with only a bit of VBA knowledge to feel comfortable enough to start working with python.
And when you browse places like r/financialcareers, it's filled with finance students wondering which programming languages they should learn. And the answer is always to learn python using jupyter notebooks.
[+] [-] b_tterc_p|7 years ago|reply
Notebooks are cells of logic. You could conceivably change the idea of notebook cells to be an instance of a function that points to raw data and returns raw data.
Perhaps this just Alteryx though
[+] [-] edparcell|7 years ago|reply
[1] https://loman.readthedocs.io/en/latest/user/quickstart.html
[2] https://github.com/janushendersonassetallocation/loman/tree/...
[+] [-] projectramo|7 years ago|reply
I wish I had these tools when I was a student (lectures laid out as notebooks that you can interact with to see how the graph changes).
Of course just reading through or listening to clear explanations is still key.
[+] [-] albertshin|7 years ago|reply
of course ymmv according to prof and institution
[+] [-] westurner|7 years ago|reply
Python version: https://lectures.quantecon.org/py/
Julia version: https://lectures.quantecon.org/jl/
[+] [-] kaffee|7 years ago|reply
This gives me the sense, personally, that economists aren't interested in making accurate predictions about the world. Other fields would, I think, test their theories against observations.
[+] [-] wjnc|7 years ago|reply
[1] https://lectures.quantecon.org/jl/zreferences.html
[+] [-] theoh|7 years ago|reply
But if you compared these notes to the notes for a college level physics course, you would find a similar level of abstraction, idealized models, and absence of real world data. Those things are not in themselves indicators that physicists (or economists) don't care about the real world. In any mature field, there is a body of knowledge and techniques to be learnt. There's a certain formalism to be picked up, rather than just staring at data.
There might be legitimate reasons for dismissing the general approach taken by mainstream economic theory, but what you seem to be saying ("hmmm, my intuition is that this stuff doesn't focus enough on accurately predicting the real world") is not a reasoned critique.
[+] [-] westurner|7 years ago|reply
pandaSDMX can pull SDMX data from e.g. ECB, Eurostat, ILO, IMF, OECD, UNSD, UNESCO, World Bank; with requests-cache for caching data requests: https://pandasdmx.readthedocs.io/en/latest/#supported-data-p...
The scikit-learn estimator interface includes a .score() method. "3.3. Model evaluation: quantifying the quality of predictions" https://scikit-learn.org/stable/modules/model_evaluation.htm...
statsmodels also has various functions for statistically testing models: https://www.statsmodels.org/stable/
"latex2sympy parses LaTeX math expressions and converts it into the equivalent SymPy form" and is now merged into SymPy master and callable with sympy.parsing.latex.parse_latex(). It requires antlr-python-runtime to be installed. https://github.com/augustt198/latex2sympy https://github.com/sympy/sympy/pull/13706
IDK what Julia has for economic data retrieval and model scoring / cost functions?
[+] [-] cirgue|7 years ago|reply
You say this as though using mock-up data to teach techniques isn't a universal practice in literally every other discipline.
[+] [-] BeetleB|7 years ago|reply
Pretty much every course I took in undergrad physics had no real world data. The intro level courses were especially fun, when we'd go into the lab and get such horrible data that we'd never conclude what they're teaching in the theory classes. We wondered what the point of the lab even was.
The biggest offender is the friction model. Heck no - it's not proportional to the normal force. No one could successfully show that in the lab. And a quick Google search shows you a trivial experiment where just changing the orientation and keeping the normal force the same leads to wildly different frictions.
[+] [-] Mikeb85|7 years ago|reply
Ever taken statistics courses? You're not doing multiple regression analysis on real world data on day 1. On day 1 you're learning odds using playing cards and coin flips.
[+] [-] ploika|7 years ago|reply
You could test your own theory against observations that calculations with real world data are very much a part of economics, but are just not part of this particular course.
[+] [-] python_gt_r|7 years ago|reply
[+] [-] yyyymmddhhmmss|7 years ago|reply
Of course, it depends on who they work for. Effectively, the American field of economics is an exercise in decoupling private reality from public theory.
[+] [-] bigmit37|7 years ago|reply
Thank you so much for sharing.
[+] [-] abdullahkhalids|7 years ago|reply
[+] [-] rmbeard|7 years ago|reply
[+] [-] thoughtstheseus|7 years ago|reply
[+] [-] logancg|7 years ago|reply
On the other hand, if you don't do any quantitative, empirical, or experimental economics -- i.e. you only do theory or political econ -- then you won't pick up these skills (as much).
[+] [-] openloop|7 years ago|reply
[deleted]
[+] [-] nwhatt|7 years ago|reply
[+] [-] zwaps|7 years ago|reply
You would see a difference in that these sort of models are used for causal inference and counterfactual analysis, whereas Machine Learning is mostly predictive.
That being said, Machine Learning is starting to apply methods developed in econometrics and/or stats, like GMM and Time Series methods. For example, Long-Term Memory models are quite recent additions to Machine Learning. The short-memory process restriction of autoregressive models has been worked on since the early 80's.