TensorFlow, Keras and deep learning, without a PhD

[+] gas9S9zw3P9c|5 years ago|reply

As a researcher in the field I am not quite sure how I feel about these kind of resources. I am all for making research accessible to a wider audience and I believe that you don't need a PhD, or any degree, to do meaningful work.

At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

You don't need a degree, but I think you do need to spend some time to get a deep enough understanding of what's going on under the hood, which often includes some math and takes time. This can be made accessible and there are plenty of good resources for that. But all these "become an AI pro by looking at some visualizations and copying this code" is maybe hurting more than it helps because it gives the illusion of understanding when it's actually not there. I wouldn't want people learning (solely) from this touching my production systems, writing blogs, or putting papers on arXiv.

[+] Datenstrom|5 years ago|reply

I have self taught this material and have been working professionally in the field for some years now. It was primarily driven by the need to solve problems for autonomous systems I was creating. When I am asked how to do it I give the progression I followed. First have preferably a CS background but at least Calc 1&2, Linear Algebra, and University statistics, then:

1. Read "Artificial Intelligence A Modern Approach" and complete all the exercises [1]

2. Complete a ML course I recommend the Andrew Ng Coursera one [2]

3. Complete a DL course I recommend the Andrew Ng Coursera one [3]

4. Complete specialization courses in whatever you are interested in such as computer vision, reinforcement learning, natural language processing, etc. These will cover older traditional methods in the introduction also which will be very useful for determining when DL is not the correct solution.

Additionally I suggest a data science course which usually covers other important things like data visualization and how to handle bad/missing data. Also, I learned a lot by simply being surrounded by brilliant people who know all this and being able to ask them questions and see how they approach problems. So not really self taught as much as untraditionally taught.

Unfortunately not a single person has actually followed the advice. Everyone has only watched random youtube bloggers and read blogs. Some have gotten into trouble after landing a job by talking buzzwords and asked for help but my advice does not change.

It does make it rather hard to find a job without a degree though, I would not recommend it. All of mine only come from strong references I have from luckily getting my foot in the door initially.

[1]: http://aima.cs.berkeley.edu/index.html

[2]: https://www.coursera.org/learn/machine-learning

[3]: https://www.coursera.org/specializations/deep-learning

Edit: formatting, typo

[+] amelius|5 years ago|reply

Well, maybe I got the wrong impression but after reading the (very accessible) Yolo V3 paper [1], it seems to me that even the experts do little real math and lots of guesswork, kicking a model until it starts giving results.

[1] https://pjreddie.com/media/files/papers/YOLOv3.pdf

[+] citilife|5 years ago|reply

I am one of those without a PhD, but have taken the time to the learn the math & contribute quite a bit to this area.

That being said, I also don't think deep learning itself is not really a "science". The issue I have is you can't predict if a network will learn.

We're effectively testing deep learning networks the same way the Romans used to test bridges. Send a bunch of elephants over them, if it holds it's good enough.

There's obviously some indicators of success, but on a whole the overall interaction between components is very difficult to calculate and near-impossible to predict. While I think it's important to understand how layers interact and how a given function will impact your optimization, etc. it's not fully required to have a deep understanding of the mathematics, at least for most cases.

I also personally don't view anything on arXiv worth anything. I typically will read articles/papers myself if reviewing a candidate and / or would like to see their publications at conferences or journals. Otherwise, it's essentially a blog post (which IMO is fine, but will require me to review it).

[+] s1t5|5 years ago|reply

Do you ever feel like all the noise influences the way you think about your own career? I work as a data scientist and sometimes find the hype so off-putting that I think that I should look for a role that's related to solving some optimization problems outside of ML, or a software engineering role in some completely different domain and work as a backend developer or something similar.

[+] mumbisChungo|5 years ago|reply

>At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

This seems extremely short-sighted to me.

If the barrier to entry (on an already relatively young technology) has come down so far that there are a bunch of noobs running rampant right now, does that not bode extremely well for future advancements in the field (assuming some non-zero conversion rate from noobs to productive members of the field over time)?

Also, let's not pretend that people with relevant degrees don't make shitty contributions to arXiv all the time.

[+] wolco|5 years ago|reply

Are you sure you want to start excluding and limiting this field? Part of the higher salaries is hype not necessarily value. That brings in the crowd who download change a parameter or two. They are not making world-wide breakthroughs but through that simple act they understand how things work better and can make more informed decisions around this topic.

Any company who hires based on a simple hello world might be the company that doesn't even know why they want ml but follow the trend. This person might be the best candidate because they can think about ml on a slighly deeper level than the company and can help bridge that first step. They probably know the buzzwords, what's hot that's important to marketing.

[+] sabalaba|5 years ago|reply

I would disagree that having a lot of interest hurts the field. Where are you seeing “noise and low quality work”? For your sake, I hope you’re not reading random papers from unknown authors on arXiv in your spare time!

I think we’ve seen impressive contribution from people without PhDs. Chris Olah and Alec Radford come to mind first. (Note: I’m not implying that you disagree with that statement, just wanted to point some role models out to those without PhDs who want to contribute to the literature.)

High quality work comes from a tiny incisive fraction of the research community. Most published research, even from PhDs, isn’t worth reading. Easily accessible tutorials promoting Colab are not the problem!

[+] devalgo|5 years ago|reply

>At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

I'm sorry but given that many papers in NeurIPS, ICML, etc are exactly what you described I find your criticism a bit lacking.

[+] sdinsn|5 years ago|reply

> At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

Note that this is not a problem exclusive to AI or Computer Science.

I have a hobbyist interest in Entomology, and I was so disappointed to see that people are still pumping out papers that are minor tweaks off old population modeling papers from the 1980s. The field is shockingly stagnant. I've read random PhD theses from the 70s that are written on damn typewriters that are higher quality than modern papers from so-called "top tier" research universities.

[+] chongli|5 years ago|reply

put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

That’s one way of looking at it. Perhaps another way is ask why all of these employers are hiring people who have done low quality research without understanding who they’re really hiring.

It reminds me of the adverse market effects on gamers when Bitcoin miners were buying up all the GPUs a few years ago. It’s another emergent collective phenomenon that’s distorting the market.

I think there may be quite a few employers out there with non-technical management who have become convinced that they need to hire a machine learning expert, without any particular reason why. They might hear a competitor has hired someone and so they need to as well. Really weird.

[+] gumby|5 years ago|reply

I consider this equivalent to the democratization of web site development and then app development. It’s certainly lead to an explosion of crummy and security-nightmare apps but in exchange has been an on-ramp for some good developers and exciting products as well.

The massive hype surrounding anything “AI” has caused the literature to become a dumpster fire, yes, but a handful of good papers still appear. Just use a low pass filter as with most things these days.

[+] melenaboija|5 years ago|reply

I agree in some sense but I also think that everybody has the right to learn at any level, mainly because not everything in neural networks field is research to create the next optimizer or a new architecture.

This is perfect for the people I work with and the role we have (and I am talking about some of them are PhDs in math and physics but without much CS knowledge). Some of this people just need to see this is a stack of non linear functions that have to be minimized and they grasp the idea of nn really fast.

To me this would be like saying that if you don't know computational complexity theory or how compilers work you are not a good software developer, but I think is just a different level and as long as you don't try to inflate your resume is fine.

Sometimes I feel some fear on AI field when the knowledge is spread, and there is a bunch of backgrounds (maths, physics, quantitative finance, chemistry, ...) that need exactly this to at list demistify nn.

[+] AlexanderNull|5 years ago|reply

Yeah, if you're not programming your models in binary then gtfo imposters!!

On a more serious note, yes understanding this and anything really takes time and investment. The problem for me at least originally not only with ML but originally with engineering back when I was trying to learn (and couldn't afford school), was finding quality sources for getting started in that process of learning. By providing simplified resources like this google one, the hope is that many beginners can get that one "aha!" moment where they start the basic understanding that allows them to start tinkering and learning.

People without a decent understanding shouldn't be submitting research papers, full on agree there. It's basically a waste of everyone's time and harmful for the field of research as a whole as it dilutes the overall signal to noise ratio. However there's so much space in the ML world that doesn't involve research, not only for fun hobby projects, but also even professionally. Resources like this are critical to reducing the knowledge gap out there between researchers and the programmers in the field that work on little ML projects for doing things like sentiment analysis for their company.

These sub-research projects are mission critical at a lot of companies yet are held up at the majority of non-FAANG companies because there's only one data scientist while the teams of engineers are clueless as to how to assist.

[+] Razengan|5 years ago|reply

> the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

Then the problem isn't the low barrier and ease of use, it's with arXiv's filtering and how resumes are valued and verified.

[+] echelon|5 years ago|reply

I'm doing deep learning without a data science background.

Some of my current results are:

* https://vo.codes

* https://trumped.com

The voices need better data curation and longer training, but some speakers such as David Attenborough are quite good.

I've also built a real time streaming voice conversion system. I want to generalize it better so that it can be an actual product. I think it could be a killer app for Discord. Imagine talking to your friends as Ben Stein or Ninja.

I've been watching TTS and VC evolve over the last few years, and it's incredible pace at which things are coming along. There are now singing neural networks that sound better than Vocaloid. If you follow researchers on Github (seriously, their social features are a killer app!), you'll see model after model get uploaded - complete with citations, results. It's super exciting, and it's the future I hoped research would become.

If you're diving into this, I would recommend using PyTorch, not TensorFlow. PyTorch is much easier to use and has better library/language support. TorchScript / JIT is really fantastic, too. I even mean this if just you're poking around with someone else's model - find a PyTorch alternative if you can. It's much easier to wrap your head around. TensorFlow is just too obtuse for no good reason.

[+] forgingahead|5 years ago|reply

Hi Brandon, nice work! Some questions to learn more if you don't mind - are you using Tacoton2 for the voice generation? If it's Tacotron2, are you using a base model before you train up new speakers, or is each speaker model trained from scratch? How long do you run the training for normally (for both cases), and what hardware are you running?

You mentioned elsewhere you're renting the V100s, what services have you used, and would you recommend them?

By the way, your Trumped.com is throwing some errors in the console so the site isn't working for me.

Keep up the good work!

[+] raverbashing|5 years ago|reply

Great work! (Though it seems to work only with words it knows)

And don't worry, PhD requirements are overrated

[+] amrrs|5 years ago|reply

This is an amazing side project. Would you share some details about how you've hosted them? Also, if possible about the training details like how long did it take for you train them? Did you do it on your local GPU or some cloud provider or free service like Colab?

[+] gotzmann|5 years ago|reply

Could you provide some examples of good work in the field of singing nets?

[+] fuzzythinker|5 years ago|reply

Think HN broke both, getting "server is busy"

[+] foobaw|5 years ago|reply

very nice work!

[+] donquichotte|5 years ago|reply

Tangentially related (and also using the ubiquitous MNIST dataset), Sebastian Lague started a brilliant, but unfortunately unfinished video series on building neural networks from scratch.

This video was an absolute eye-opener me [1] on what classification is, how it works and why a non-linear activation function is required. I probably learned more in the 5 minutes watching this than doing multiple Coursera courses on the subject.

[1] https://www.youtube.com/watch?v=bVQUSndDllU

[+] pests|5 years ago|reply

One "ah ha" about that required non-linear function I had was the fact that if you were just passing numbers through a series of linear functions they could by definition just be combined into one equation.

[+] wirrbel|5 years ago|reply

The whole AI/ML stuff has become so hyped up that its probably time to find another topic of interest in software engineering for me. Its a weird melange nowadays where frameworks and "academic credentials" are fused together by major tech companies and leaves me - who has deployed a dozen of classical ML models into production that are still running after couple of years - wondering what this is all about.

Overall, working with people with different backgrounds, a ML-related PhD is usually not correlated nor anti-correlated to these people having a good understanding of the relationship between models and and their applications.

I wish we could leave the framework and name-dropping behind and talk more about what it takes to evaluate predictions, how to cope with biases, etc.

[+] axegon_|5 years ago|reply

Really familiar territory. I think the hype has poisoned the minds of many and at this state "AI/ML" has turned into a simple buzzword. Much like "blockchain" 2 years ago. And while I'm still as fascinated about ml as I was 5 years ago, like many others, I've decided to stay in the shadows and do my own thing just for the fun of it. Especially since marketing and ego started playing a big role around those communities. It genuinely makes me sad but I think I always knew in the back of my mind that this would likely turn out to be a nail in the coffin of AI/ML, not robots taking over the world.

The way I see it, ML/AI is nothing more than a marketing campaign for much of the industry and few people realize that it's often a small component and rarely a major selling point for anything. Like "ml-powered kitchen blender" or whatever. As you said, few people discuss evaluating predictions, tackling biases. I suspect because most people are a lot more interested in snatching a piece of the cake.

[+] mattigames|5 years ago|reply

Instead of saying anti-correlated is better to say "inversely correlated" (or if you mean lack of correlation then "uncorrelated")

[+] the_cramer|5 years ago|reply

"I wish we could leave the framework and name-dropping behind and talk more about what it takes to evaluate predictions, how to cope with biases, etc."

We can, can't we? "We" as in professional software developers. I always thought of this word-hyping of a management thing and a buzzword pool for non-techies. I know this influences our work, but that doesn't keep us concentrate on what's really up... or am i wrong?

By the way, I tried getting into ML but i'm really poor at maths and at that time was not willing to put time into maths. And nearly every tutorial back then threw formula after formula at my face... So a bit of mathematical education could not hurt. Doesn't have to be a PhD though.

[+] op03|5 years ago|reply

Well the hype is required to get your grandmom, older execs, or a strategy/biz dev team at a brick and mortar firm who can afford only 1 dev to gain confidence that they too can use ML.

[+] amelius|5 years ago|reply

"AI/ML engineer" is the new "web developer".

[+] unknown|5 years ago|reply

[deleted]

[+] xvzwx|5 years ago|reply

Its true that with todays Frameworks and easy API calls almost everyone with a little technological background can deploy a ML/AI model and get sufficiant results. But Bootcamps cannot replace an academic education. As soon as you are not able to understand and review new released papers and insights and have to wait for an high level blog entry or video course on that topic you are worth nothing. Without a deeper understanding you can just guess what is going on inside that blackbox NN or ML model and have to rely on blindly change parameters and even worse you are not able to understand your results or compare someoneleses results with yours using statistical tests and so on. So in the end people (maybe not everyone) without academic background are just API callers that will struggle on the long term.

[+] secondcoming|5 years ago|reply

It's a pity that most Tensorflow tutorials out there seem to deal with images. We tried to use it for real-time data classification (data -> [yes | no]). Every tuturial out there seems to assume you're using Python (which is probably not an invalid assumption). Here's my 2c when trying to use Tensorflow with C++:

a) Loading SavedModels is a pain. I has to trawl the Tensorflow repo and Python wrappers to see how it worked.

b) It's incredibly slow. It added ~250ms to our latency. We had to drop it.

c) It has a C++ framework that doesn't work out-of-the-box, you have to use the C lib that wraps an old version of the C++ framework (confused? me too).

d) It's locked to C++03.

Tensorflow-Lite looked to fit the bill for us, but our model weren't convertible to it. We no longer use Tensorflow.

[+] osipov|5 years ago|reply

I don't understand why you are getting downvoted. TensorFlow 1.x barely worked and people stuck with it because the alternatives were worse. I moved to PyTorch as soon as I could because it is better than TensorFlow 1.x, TensorFlow 2.x, or Keras w/TensorFlow backend.

TensorFlow is designed by committee and is more of a brand now than a machine learning framework. Did you know there is TensorFlow Quantum?

[+] lostmsu|5 years ago|reply

I am using Python's TensorFlow API from C# through my own binding, and I don't understand how you got 250ms latency on C API without screwing up on your side. I could effortlessly run a super real-time network playing a video game with soft actor-critic with my setup on 1.15.

[+] matlo|5 years ago|reply

This is a nice introduction, even though as most of the tutorials on ML it goes from 0 to 100 in 2 lessons.

A couple years ago I started studying ML, and I have a design background, so I needed to digest all the math and concepts slowly in order to understand them properly.

Now I think I understand most of the fundamental concepts, and I've been using it quite a lot for creative applications and teaching, and I have to say the best resource I've found for beginners, by far, is "Make Your Own Neural Network" by Tariq Rashid.

It starts really from the beginning and it takes you through all the steps of building a NN from zero, with no previous knowledge. really good.

[+] lumberjack|5 years ago|reply

Since everyone is talking about hype in ML, I wish there was some hype for good ole' conversional scientific computing. Yes, it's not so sexy, you have to build your own model yourself, and then the hard work is in finding and verifying a suitable numerical method and finally devising a solid implementation. It requires a vast number of different skills, anything from pure math to low level programming and it is definitely not trivial work, but it does not seem like it pays that well.

[+] sibmike|5 years ago|reply

I am constantly puzzled by people saying that AI is overhyped and fresh grads won't have enough jobs for them. Almost every real life industry: retail, logistics, construction, farming, heavy industries, mining, medicine have just recently started to try AI. The amount of manual and suboptimal tasks that have to be automated and optimized is enormous. I am pretty sure there is more the enough work for applied DSs with domain knowledge in mentioned industries.

[+] mark_l_watson|5 years ago|reply

This is very well done, hitting on some pain points and explaining how to work around them.

I have devoted close to 100% of my paid working time on deep learning for the last six years (most recently managing a deep learning team) and not only has the technology advanced rapidly but the online learning resources have kept pace.

A personal issue: after seven years of not updating my Java AI book, I am taking advantage of free time at home to do a major update. New material on deep learning was the most difficult change because there are so many great resources, and there is only so much you can do in one long chapter. I ended up deciding to do just two DL4J examples and then spending most of the chapter just on advice.

The field of deep learning is getting saturated. Recently I did a free mentoring session with someone with a very good educational background (PhD from MIT) and we were talking about needing specializations and specific skills, just using things like DL, cloud dev ops, etc. as necessary tools, but not always enough to base a career on.

Definitely it can help peoples’ careers working through great online DL material, but great careers are usually made by having good expertise in two or three areas. Learn DL but combine that with other skills and domain knowledge.

[+] robpal|5 years ago|reply

I attended a conference talk of a FB AI-engineer talking about her paper with backprop equations so obviously wrong my eyes hurt, and incorrect definitions of objects. It did not stop her from participating (btw. this is always unclear -- who did what) in state-of-the art research in object detection.

PhD is overrated in the deep learning context. It is more about forging the intellectual resilience and ability to pursue ideas for months/years than learning useful things/tricks/theorems.

[+] dekhn|5 years ago|reply

Twenty five years ago, this would have been "LINUX, UNIX, and serving, without a PhD" and Matt Welsh's Linux Installation And Getting Started was the intro (https://www.mdw.la/papers/linux-getting-started.pdf). I was one of many who adopted Linux early, using this book (later I read the BSD Unix Design and Implementation, which I would describe as senior undergrad/junior grad student material).

Having those sorts of resources to introduce junior folks to advanced concepts are really great to me- my experience is that I learn a lot more by reading a good tutorial than a theory book, up until I need to do advanced work (this is particular to my style of learning; I can read code that implements math, but struggle to parse math symbology).

[+] m0zg|5 years ago|reply

Mandatory plug: do consider using PyTorch instead. It's far easier to pick up and work with. Easy things are easy, hard things are possible.

[+] halflings|5 years ago|reply

The video version [1] is also pretty awesome, though its code itself is a bit outdated now. Explains a lot of very practical issues that you might not find in most academic textbooks, but you encounter every day in practice.

[1] https://www.youtube.com/watch?v=vq2nnJ4g6N0

[+] amrx101|5 years ago|reply

Have a simple rant here. All these BIG $ companies every now and then come out with statements and what not, that doing AI ML is very easy and every one including their cats should do AI, ML courses and training(preferably on their platform). Once that is done the job market is yours. Reality is far from this.

- Today AI|ML does not have the capability marketed by these big companies. Incidentally marketing is targeted at governments, big non tech companies and gullible undergraduates.

- Undergrads many a times take these training courses wherein they acquire the skill set to call these APIs and flood the job market wherein a data-entry or data-analyst job is tagged as AI|ML job.

- High paying jobs in AI|ML still require a Masters or PhD or a mathematical background.

In conclusion, the current hype around AI|ML is misguiding gullible undergrads and governments(I dont mind the government being cheated THOUGH).

[+] seek3r00|5 years ago|reply

The title is obviously clickbait-y, but it’s fine: they’re trying to sell a product (Google Colab).

IMO if you’re interested in AI research or ML engineering, you already know that — in order to avoid getting people killed - you have to understand how it works under-the-hood. You’re doing yourself, your employer and your fellow humans a favour.

Just keep up the good work, and ignore the bullshit. If an AI winter comes, you’ll be well prepared to migrate to another engineering role.

[+] polskibus|5 years ago|reply

I wonder if Google is using this resource to train their own staff without PhD and after that allow them to work as ML engineers? That would provide credibility to such program - instead it is more aimed to sell more ML computing power to the masses (who won't really understand how to use it to get meaningful results).

[+] Guest42|5 years ago|reply

To me it seems the real goal of hyping these things so heavily is to increase their cloud revenues.

[+] DrNuke|5 years ago|reply

As usual with tools, it’s the use instead of the instrument. Domain knowledge still has advantages against generalism, in applied and critical fields. I mean, you can’t do medicine or materials yet with AI/ML without understanding the domain?

[+] arunoda|5 years ago|reply

This is something I did with fast.ai - https://deeplearningmantra.com/

[+] daiyanze|5 years ago|reply

Is this tutorial aiming at entry level? Those graphs are quite difficult. I guess it will take a lot of time to do homework on some fundamental curriculums.

144 comments