top | item 42697863

Ask HN: How do you guard against ChatGPT use in technical interviews?

5 points| calabin | 1 year ago

Yesterday, we did two basic-screen technical interviews where both candidates appeared to use LLMs to generate nearly their entire answers.

We do this quick screen after a 30-min behavioral interview to make sure that candidates can generally operate at the skill level they claim on their resumes.

In the past, we've been shocked by the number of people who will talk a big game, but have really rudimentary programming skills when the rubber meets the road.

The questions are:

1. FizzBuzz

2. Generate the first 20 rows of Pascal's Triangle

3. Drop all non-prime integers from a pre-defined set of 2-to-N

The first candidate we didn't totally suspect, until the second candidate provided nearly letter-for-letter the same answers (same variable names, function names, etc.).

After the interviews, we popped our deck into ChatGPT + Claude and it output exactly what these two candidates had provided.

Last week, a third candidate sent us clearly ChatGPT'd code as an example of some of his work.

I'm unsure what to do here, so I come to you HN to ask, what you have done to guard against the use of LLMs in remote technical interviews? Thanks!

Bonus: The nail in the coffin was when the second candidate immediately clocked the last question as leveraging the Sieve of Eratosthenes. Previously, he'd shown us a pretty impressive portfolio. When asked how he knew the Sieve of Eratosthenes off the top of his head, he claimed he had used it in one of his commercial portfolio projects but couldn't explain how.

30 comments

blackbear_|1 year ago

Ask them to share their screen, walk you through their code and explain their solution live. Every once in a while ask why they did it in that way and what alternative approaches there could be. Then change the problem a little and ask them to modify their code, again live. Generally, try to go deep and ask follow-up "why" or "how" questions. Those who "don't remember" or only offer vague and shallow answers are likely to have cheated with LLMs (or are just poor candidates).

calabin|1 year ago

This is generally what we do and what raised our suspicions in the first place - they both could "walk us through" their code, but had trouble explaining why they did certain things, how they could improve things, etc.

We thought that the approach you've outlined would generally be good enough, and has led us to catch instances where people are leaning heavily on LLMs, but our issue now is that everyone appears to be using these things. Admittedly, our sample size here is low (n=3). But it's still frustrating nonetheless.

uberman|1 year ago

While I'm personally not keen on LLMs, I admit I do have the co-pilot extension installed in Visual Studio and have been pleasantly surprised at how tab-completion is working. It seems effective for small blocks of code.

So, remembering that I not really a fan when I ask this... but why do you care if a candidate uses an LLM or Google as part of your interview? Do you care if they use an IDE or if they use a code completion plugin? In the end, do you not really want to evaluate if the candidate can produce good clean code?

If you feel like an LLM is too big a crutch, is that because what you wanted to test was memorization of a framework or a test of thought and workflow strategies?

To quote a resource I'm also not keen on but understand why it exists, does your concern about chatGPT during interviews actually point out an XY problem?

calabin|1 year ago

The screen we are performing here is a basic "can you program" type of evaluation.

We've run into a number of people with seemingly-decent resumes (several positions as engineers at reputable albeit non-FAANG companies like insurers or e-commerce firms) who have struggled to complete basic tasks like the Pascal's Triangle question mentioned above.

The intent here is to toss them a couple of softballs that they should be able to knock out of the park, almost like if they were helping a younger sibling with CS 1XX or 2XX level work.

We're not against the use of Copilot, etc. once onboarded. We just want to make sure that these candidates possess basic skills that their resumes would suggest they mastered years ago.

charleslmunger|1 year ago

Sieve of Erastothenes is often introduced alongside prime numbers in elementary school math class and it has a funny/memorable name - it's not that weird for someone to know it. It is weird to lie about having a practical use case for it since running it to the point of a cryptographically useful prime length is infeasible and it requires O(n) memory.

It seems vanishingly unlikely that this type of question can provide any signal any more outside an in person interview. The incentives are just too strong for candidates and the tools are too good.

calabin|1 year ago

Great point on the Sieve - knowing it off of the top of his head wasn't itself a red-handed indicator of cheating, but claiming that he'd used it in a B2B SaaS app was. Especially given that he wasn't able to explain how he'd allegedly used it.

The candidates we're bringing into this screen typically have 1-3 prior positions on their resume, so the point here is to throw them some softballs that they should be able to crank through with some ease to demonstrate that their basic programming skills are there.

We've had experiences where people who have held legitimate programming jobs at F1000 companies struggle greatly with some of the basic questions that I've listed above. I'm not sure why, but it's the case.

We try as best as we can to adjust for anxiousness, I know that programming in front of others can suck, but all the same we're just trying to establish, "before we go forward, can you do some elementary tasks that anyone with your claimed experience should be able to do"

Do you have any suggestions on better questions?

sandropuppo|1 year ago

It's a great question.

What we are doing is to ask the candidate to have hands visible to the camera for the interview. But some systems are working with voice only and this will not be working in those cases.

Probably the best way would be to have the ChatGPT answer beforehand and confront it with what the candidate is saying?

calabin|1 year ago

Having the GPT answer beforehand would allow us to confront them about it sooner, but at that point we're definitely not hiring them since they thought it a good idea to use an LLM to cheat on questions they should be able to do in their sleep.

Interesting that you're asking the candidate to have their hands visible. We haven't wanted to have to go that way, but we might.

maxwell|1 year ago

Ask questions that involve trade-offs, be they design or performance related, and explicitly tell candidates that you're benchmarking against ChatGPT and expect something beyond what an LLM would give, i.e. you're looking more for creative/critical thinking than mere correctness.

calabin|1 year ago

This is an interesting direction - especially benchmarking against GPT and telling the candidate we are.

Do you have any suggestions about the type of questions we could be asking here?

thecrumb|1 year ago

This would be like asking a carpenter to build you something without a hammer. At this point we need to realize LLM is a tool like everything else. Maybe give them a LLM challenge - how to do X. What is your prompt. Why?

calabin|1 year ago

I disagree with the carpenter-hammer analogy here pretty strongly.

We're basically trying to figure out if they can code generally, or if somehow they've skated by in their last positions without the fundamental skills of programming.

I'm not sure how, but we've come across a number of programmers from F1000 companies that can't seem to hit some of the basics in their chosen language.

LLMs have their place as a tool, but before we empower them with the latest and greatest programming assistance, we want to make sure that they have the skills to do things like critically interpret the output of Copilot, etc.

We want to make sure we're that the people we hire possess the skills they claim, and that they won't serve as a very-slow wrapper for the LLM tools we already pay for.

TheMongoose|1 year ago

LLM's are absolutely not a hammer. LLM's are an Ikea shelf in a box. You're not a carpenter because you can assemble one.

We'll leave aside the argument of whether or not a carpenter would rather buy a cheap Ikea shelf than build one in some situations, but it's also applicable to this analogy.

rvz|1 year ago

Just ask the candidate whether if they have contributed to relevant large open-source projects in the language you are looking for. If not, then give them a hard leetcode question.

All you have to do for this hard Leetcode puzzle is to ask the candidate to complete it in Rust.

ChatGPT will struggle to help the candidate as it generates garbage.

After they have completed it then for the second technical interview, question the candidate around how they came up with the solution step by step to show if they really understand both the language and algorithm used to solve the puzzle.

This rigorously filters out 95% of frauds and impostors whilst targeting the best and brightest (really).

Job done.

calabin|1 year ago

Contribution to OS projects would allow us to bypass this screen altogether, but great point.

Haha we might have to switch to Rust just to ease our interviewing woes here - we're primarily a PHP/Laravel shop, and have tried to be as charitable as possible to candidates by allowing them to program in the language they're strongest in. Perhaps we need to change that.

The screen we're doing is meant to filter out the frauds/imposters and so far has a 100% success rate, but unfortunately we've caught three out of the last three people who have made it to that point. It's become a huge waste of time.

Maybe we just need to curate candidates from OS contributors to prevent this, or something of that nature.

grajaganDev|1 year ago

The problem is Leetcode style interviews, not ChatGPT.

calabin|1 year ago

We're definitely not doing any "Leetcode style" interviews over here - these are basic "can you perform basic programming tasks as you've stated" type questions.

jrgilman|1 year ago

TIL Fizz Buzz is "leetcode"