Green AI | WingNews

[+] gwern|6 years ago|reply

This is ultimately a silly and misguided proposal, which focuses on the costs and not benefits. There is nothing special about training an AI; it is simply another thing to spend resources on, no intrinsically worse or better than any other way. A biology experiment has 'carbon footprint'. The ITER has a 'carbon footprint' (and probably one orders of magnitude larger than all AI research this year). HN burns CO2. Everything uses energy or it uses things which require energy, like human labor, and they involve other costs like opportunity cost which are just as real and important. They stand or fall on their net merits, not how much electricity they use.

There is no need to go around criticizing people for 'green AI' by myopically focusing solely on an abstract electrical cost of training. And if there is, then that applies to everything which uses electricity, and is better handled by putting a carbon tax on energy sources, and letting the market find the most unprofitable uses of energy (which will probably not be AI research, I'll tell you that...) and stop it and substitute in more 'green' power sources for everything else.

More importantly, if you are concerned about the costs of training AI, you should be concerned about the total costs as compared to the total benefits, not slicing out a completely arbitrary subset of costs and ranting about how many 'cars' it is equivalent to (which is not even strictly true in the first place considering that many data centers are located near cheap and renewable power like hydropower or nuclear power plants!) and shrugging away the issue that people consider these performance gains important and well-worth paying for. There are costs to a model which is worse than it could be. There are costs to models which run slower at deployment time even if they are faster to train. There are costs to models which cannot be used for transfer learning (as the criticized language models excel at, incidentally). And so on. What matters are the total costs, and corporations and researchers already pay considerable attention to that already. (Not a single one of their metrics - 'carbon emission', 'electricity usage', 'elapsed real time', 'number of parameters', 'FPO' - is an actual total cost!)

[+] dxbydt|6 years ago|reply

> is ultimately a silly and misguided proposal

No its not.

Its analogous to quantifying an F test with watts.

Nothing wrong with that. I can give a billion row dataset to a dozen people. John assumes normality, samples 100 rows, runs a linear regression, gets 70% R^2, under 1 second on a 1980s era computer. Mary doesn't assume normality, runs a glm via irwls, gets better explanatory power than John, still on 1980s computer, though she takes 5 seconds instead of 1. Then James comes along & runs a decision tree & is twice as good as Mary, but he now needs a second on a 1990s PC. Tony uses a random forest & Baker wants a 128 layer neural net. And so on & so forth, until we end up with burning enough energy that would otherwise power a village, to overfit some dataset that excel at some completely artificial leaderboard metric.

At some point a grown up comes in & says you don't need to light up your bedroom with industrial stadium lighting if you need to do your homework, a 40w table lamp will suffice. Hell half the third world uses a candle & they are performing quite well, if not better. That's really all this is.

The marginal gains in performance aren't justified compared to the amount of energy you end up expending. Half these models are brittle & have zero shelf life, ppl regularly throw away stuff they wrote just a year ago, so to what raison d'etre this pursuit ?

[+] 6gvONxR4sf7o|6 years ago|reply

>This position paper advocates a practical solution by making efficiency an evaluation criterion for research alongside accuracy and related measures. In addition, we propose reporting the financial cost or "price tag" of developing, training, and running models to provide baselines for the investigation of increasingly efficient methods.

[emphasis mine]

They suggest reporting cost metrics next to benefit metrics so readers can judge the tradeoff for themselves. There's no proposal to ignore the benefits.

[+] tlb|6 years ago|reply

It is useful, though, to compare results of projects with different budgets. There are papers tackling the same problem with 4 orders of magnitude difference in computing. Including an objective metric in the results would be worthwhile.

In this case, the metric I want is either TFLOPs (for a commercially available architecture) or kWh (for some in-house machine) rather than tons of CO2, which depends on their power source.

[+] m0zg|6 years ago|reply

I disagree with some of this. The main thrust of their paper is correct, IMO: cost of training (and inference) needs to be factored into the comparison when comparing models. You can't optimize what you don't measure, and currently many SOTA results have the training regimes which take a ton of compute and aren't really doable in a research context if you're not Google and can't just use racks full of idle TPUs to do your stuff for free in the trough of the diurnal cycle while customers aren't using them.

Cost of training is currently ignored so much, some of the authors don't even mention it in their papers, yet it has a direct relationship with things like researcher productivity and reproducibility of results by mere mortals.

I wouldn't call this "myopically focusing on the cost of training". I'd call this "acknowledging cost of training as a valid metric worthy of publishing and optimization". I hope everyone can agree with this. Once you start optimizing this in addition to your mAP or BLEU or top1 accuracy, the results could be multiplicative.

I suspect this will ultimately come down to math: both having architectures which are amenable to efficient gradient propagation and having better optimizers.

[+] Barrin92|6 years ago|reply

> There is nothing special about training an AI; it is simply another thing to spend resources on

If the goal is actual intelligence, as the 'I' in the name appears to suggest, then making resource consumption in the form of energy/compute a central part of the equation makes a fair deal of sense. It's arguably long overdue and might actually push research towards methods that actually do increase the capacity of agents to learn rather than just praying for faster chips and more data.

The benefit here can be twofold in that more inclusivity increases the pace at which the field advances by including more researchers, but also just gives researchers an incentive to not treat compute/data as unlimited moving closer towards figuring out how the brain actually learns.

[+] mark_l_watson|6 years ago|reply

I like this idea! In last week’s Lex Firdman AI interview, Gary Marcus touches on the low twenty watt energy requirements of the human brain compared to deep learning energy requirements.

Basically adding energy requirements to the loss function of automated neural model architecture search seems like a good idea also. (I am thinking of frameworks like AdaNet, etc.)

I retired this year but I still spend a lot of time reviewing deep learning and also conventional AI literature (and I do tiny consulting projects to help people get started or build prototypes).

Since I now mostly pay for my own computer resources I try to mostly limit myself to what I can do on my System 76 laptop with a single 1070 GPU. The availability of pretrained models makes this not so bad, at all. I really appreciate efforts by huggingface (and other organizations) of offering reduced size models that still provide good results.

[+] dharma1|6 years ago|reply

Don't forget biological brain "neural architecture search" has also had a significant time and energy cost - billions of years of evolution, powered by a pretty large fusion reactor for those billions of years.

That's the reason brains are so energy efficient today. Though you are right, energy constraints are built in with biological evolutionary search

[+] AndrewKemendo|6 years ago|reply

The math is just wrong though.

To train a brain to be competent in a classification task it takes 20W rms for years on end for the individual + all of the wattage from the parents, grandparents, teachers etc... that are training the individual for those years. Very hard to determine the power allocation used for a specific human being trained on a narrow task for eg. object classification, but that doesn't mean it's not the comparable measure.

Comparing training of a single model to the "instant power" draw of the brain is not just over simplistic, the scale and time periods are wrong.

[+] mlthoughts2018|6 years ago|reply

Back when I used to frequently visit Less Wrong, I made a post about this related to Watson from IBM.

I remember feeling really discouraged by the level of discourse and downvoting in the comments at that time.

https://www.lesswrong.com/posts/kaNErr6mbXvDF9YFf/watts-son

[+] m0zg|6 years ago|reply

If you do consulting, you should consider using pre-tax money to buy whatever hardware you need for work. Trump made it much easier to amortize business expenses up to a certain limit, and this could save you a lot of money if you're in the higher tax brackets. I spent ~16K on deep learning hardware for my business this year so far (consumer GPUs, since I'm not running them in the "datacenter") because I'm not made of money and think current cloud GPU pricing is a rip-off.

[+] CoffeePython|6 years ago|reply

I find it strange that the part of the conversation where AI has the potential to save thousands of hours of human labor doesn’t show up more often in these types of threads.

That being said, we need to definitely be thinking about lowering the cost (environmental and monetary) of training these models. I’m glad research is being done within this domain.

I’d love to see a study on what the human labor cost potential vs. training environmental costs would be for certain large models.

[+] rahkiin|6 years ago|reply

I find it interesting that you see 'save thousands of hours of human labor' as something that's surely positive. What should those people start doing instead? They also need an income to get bread on the table.

[+] sharperguy|6 years ago|reply

Won't lowering the cost/complexity of deep learning just allow those with more resources to increase the complexity of their models while keeping the costs the same?

[+] cscurmudgeon|6 years ago|reply

A lot of these AI research projects don't have immediate economic impact.

[+] ausbah|6 years ago|reply

Like the abstract mentioned, I think this is also a good criteria for helping "level the playing field" for groups with lower budgets so the phenomenon of simply throwing more computational resources at a problem with an existing resource becomes less of a novelty.

[+] remon|6 years ago|reply

Hm, I struggle to see an upside with levelling the playing field that way. Groups that have the budget to throw huge amounts of resources at problems are still providing important insights. That can happen in parallel to optimising wattage/computational unit. In fact, that can be an almost completely parallel track in AI research.

[+] css|6 years ago|reply

The citation for the "surprisingly large carbon footprint" [0] is crazy. The paper alleges that a car, including fuel, has a lifetime footprint of 126,000 lbs of CO₂e, and training a big NN transformer consumes 626,155 lbs of CO₂e, almost a 5-fold increase.

[0] https://arxiv.org/pdf/1906.02243.pdf

[+] tsbinz|6 years ago|reply

Training it _with neural architecture search_. Just training it with a given architecture they cite as 192lbs ...

[+] sriku|6 years ago|reply

Research that demonstrates, say, 1000x reduction in power for training a known problem (ex: mnist) won't be considered irrelevant by the community. So is there a specific need to bias against large works apart from the carbon footprint argument? There remain questions to be answered in that space too - such as are current "neural" architectures adequate to cover the capabilities of the brain when upping only the scale? It was certainly worth knowing that scaling up was all that was required to compete with humans in DOTA. But will we hit wall as we near human level complexity? After all, the money spent on making movie which is "just" for entertainment trump's multi million $ deep learning experiments in costs which I guess have some correlation with the carbon footprint .. or the gases emitted in rocket launches. Do we really know whether this well intentioned call for green AI (which I right as hell want) will do too little for the greenness while biasing people against possible discoveries that can lead to a greener future a tad too early?

Edit: pardon typos due to mobile device.

[+] taneq|6 years ago|reply

> The computations required for deep learning research have been doubling every few months

I feel this is misleading. Computation is cheap, so the computation thrown at deep learning research has been doubling every few months, but that doesn't mean it's required to do research (unless your research is "throw huge datasets at a neural net architecture and see if it sticks".)

[+] atoav|6 years ago|reply

I understood it as “computation required to replicate stuff outlined in a paper has doubled” not as “we do more research so we do more computation”.

[+] mattkrause|6 years ago|reply

That...does seem to be a lot of the current research though.

[+] malux85|6 years ago|reply

This reminds me of that Simpsons quote "We can't fix your heart, but we can tell you exactly how damaged it is"

[+] swalsh|6 years ago|reply

I'm not sure about that. I think you get what you measure, and if you start measuring effeciency, you're going to start seeing a major incentive to make it more effecient.

[+] esotericn|6 years ago|reply

This quote amused me:

> deep learning was inspired by the human brain, which is remarkably energy efficient

Yeah, sure, if you purely look at its' energy consumption in isolation and ignore all of the flights of fancy we engage in (such as, for example, this NN model) in order to maintain its' coherence.

[+] keithyjohnson|6 years ago|reply

Efficiency metrics would be really useful is evaluating DNNs for embedded solutions as well.

60 comments