The part about bad Keras<->Tensorflow.js interop is classic Tensorflow. Using TF always felt like using a bunch of vaguely related tools put under the same umbrella rather than an integrated, streamlined product.
Actually, I'll extend that to saying every open source Google library/tool feels like that.
> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.
Semi-related but I needed a CAPTCHA on my site[0] mainly to block comment form spam and settled on repurposing a fun method I’d seen before. Is definitely not foolproof (or hard at all), but I really liked making it.
There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.
However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract
However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.
Just because you can technically crack them doesn't mean they're useless.
There's a significant amount of time, skill and effort that went into the solution from this post, and the end result doesn't generalize well (you'd have to start all over for a different kind of captcha).
The vast majority of spammers would not be able to replicate this; those who do would either make money legitimately, or focus their skills on juicier targets (if you have AI/ML skills and want to do nefarious things there are other options that pay much better than spamming).
Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.
Makes me wonder what comes next. Could we create a forum where every member must do a 15 minute video interview with a moderator? I know this "doesn't scale" but I think it could make for a funny gimmick.
I think captchas are just another lind of defense to make it harder for actors abusing the system. It's not a solution, just a little (getting outdated) fortification.
Small? From your own link, recaptcha v3 takes 10-15s and costs $1.3 for 1000 captchas. This is actually huge, and cost prohibitively expensive for many things where you would want to use it (like scrapping a large website).
> so really captchas are more like putting a small min amount of effort.
At that point a proof of work captcha (mCaptcha.org is one, but there are others), is probably the best option. Especially with how any reasonably effective traditional captcha is an accessibility nightmare.
Appropriate response by 4Chan to this: simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability.
> simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability
Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.
This is already available to not have CAPTCHA. So if CAPTCHA is totally ineffective, it follows that they should do away with CAPTCHA and free users being able to post at all and everyone should buy the 4chan Pass if they want to post.
4chan doesn't care about human annoyance. They just started doing a 15 minute post delay, which is infuriating. I had to whitelist 4chan in Cookie AutoDelete.
I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.
If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.
Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.
That's kinda what every major captcha distributor does already!
Even before captcha is being served your TLS is first fingerprinted, then your IP, then your HTTP2, then your request, then your javascript environment (including font and image rendering capabilities) and browser itself. These are used to calculate a trust score which determines whether captcha will be served at all. Only then it makes sense to analyze captcha's input but by that time you caught 90% of bots either way.
The amount your browser can tell about you to any server without your awareness is insane to the point where every single one us probably has a more unique digital fingerprint than our very own physical fingerprint!
In my opinion the granddaddy of all 4chan CAPTCHA busts is still Yannick Kilcher’s GPT-J tune on “Raiders of the Lost Kek” set, and might be the coolest thing an LLM has ever done on video: https://youtu.be/efPrtcLdcdM?si=errY0PrEhnX9ylDw
> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.
> TensorFlow.js doesn't support Keras 3.
I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.
As someone who has been working in ML for years, I can only recommend to stay away from anything recent. Grab an old bayesian statistics textbook and learn the fundamentals, then progress to learning the major frameworks like Pytorch. Try to write every part of a CNN, RNN and Transformer architecture and training pipeline yourself the first time (including data loaders, but maybe leave out CUDA matrix kernels). Stay the hell away from wrappers for other people's wrappers like Langchain. Their documentation is often not just outdated, but flat out wrong regarding the fundamentals. Huggingface is great if you know the basics and thus how to fix things if their standard wrappers break.
Following the links to the captcha solving service you can read profiles of the humans doing the work where its pitched as more ethical than them working in hazardous factories!
I can only imagine how much worse they'll make the captcha after stuff like this picks up speed with the users all the while being ineffective against the bots.
captchas are broken, forever. There is no way to prevent bots without also preventing a bottom tier of human users (visually impaired people, old people, or just impatient people). Like this xkcd [1] comic suggests, we need to just focus on rewarding and punishing specific behavior, regardless of whether the agent is human or not
I really hope my post didn't come off as if I was trying to make it sound like this was a new idea. Regardless, this is good information, because it counters the posts of the form "great, now that you made this, you're going to make it harder."
Yeah I had been under the impression that the point of captchas like this (and those "slide a puzzle piece" ones) weren't the solution to the problem as much as checking for human-like mouse movements.
I've built 3 iterations of captcha solvers for that crappy website based on https://github.com/drunohazarb/4chan-captcha-solver/issues/1 . The only thing I've learned along the way is that it's mostly pointless outside of a "learning" exercise, since they'll change the captcha (in terms of letter count or the entropy background). Initially, it was 4 characters with pretty obvious background, then it turned to 5, then it was both 4 and 5 and the current iteration which is also either 4 or 5, but with a lot of entropy surrounding the characters.
This project was really my first decent introduction to computer vision and machine learning (along with that of those who helped me in various ways; none of them desired to be credited here other than the guy who collected some of the data for me.)
It was definitely a successful learning exercise, and it's made me more confident tackling some other problems I've had in mind for awhile.
Hey dude. Any idea if 1000 labelled images are good enough for training and how much time it would take to train on a a40 nvidia like on https://www.runpod.io/pricing ?
It might be worth noting that this, including the harder version the op encountered, are not the hardest captchas that 4chan can serve. There is a still harder version which is sent to less trustworthy IPs. I imagine it would still be tractably solved with computer vision. This in part misses the point though, since 4chan has been continuously altering their captcha since it released, making it difficult to create a permanent solution that won't be broken down the road.
Datacenter IPs can’t even post at all, nevermind needing to solve a CAPTCHA. That’s why the accusations of “VPN shill” are usually wrong, as is the assumption of anonymity – 4chan is in fact one of the least anonymous sites on the internet. The optional username feature gives it a veneer of anonymity, but the strict IP requirements ensure almost every post is attributable to a residential internet connection, and reliably associable with other posts from that same connection.
It’s nice to see this posted and interesting that it’s in tensorflow. I wonder for how many years the capture was already broken but not just posted about publicly.
> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented, and the error messages thrown when you try to use it on Python 3.12 are non-obvious. I tried an older version of Python (3.10) on a hunch, using PyEnv, and it worked like a charm.
Amazing. And then people wonder why "just use python 2" is still a thing.
More specifically I mean when they insidiously give you infinite tests even though it's impossible to pass because the IP has been blacklisted... There's a special place in hell for the anti-human's that made that decision, and yes it involves captcha.
I would also be inclined to believe that my project to solve the proprietary 4Chan text CAPTCHA cannot solve an unrelated image CAPTCHA. I'd bet a lot of money on it, in fact!
I wasn't a very active 4chan poster to begin with, but when they introduced this awful CAPTCHA, and later the 300s countdown before making the first post, I completely lost interest in using the website.
Anonymous boards were supposed to be low-friction, but now 4chan is one of the most user-hostile social media platforms around. It takes a special kind of dedication to post there, which I seriously doubt helps the quality of the site.
one of the biggest problems that 4chan has to combat is spam. unfortunately, at 4chan's scale, hcaptcha and recaptcha are not free. 4chan is not exactly a font of money, either. the only reason they turned to this awful homebrew captcha was because recaptcha stopped being free. is there any better way to do it with a single developer for a website that serves millions of people a day?
Do a Web search for "4Chan CAPTCHA" sometime. All the top results will likely be people complaining about how terrible it is. You're certainly not alone.
The worst part about the countdown: if you wait too long to make a post after waiting the 10 minutes (eg: you get distracted,) it will expire, and you have to wait another 10 minutes.
The addition of the post countdown has had a pretty noticeable effect on posts/day across multiple boards: https://4stats.io/
When an earlier version was trialled on /biz/ (mandatory email verification - https://warosu.org/biz/thread/58388587), it nuked the board and it hasn't recovered.
recaptcha is terrible if you are cursed with an ISP that Google deems icky for some indiscernible reason. at the time, I was getting slowly fading bullshit that invariably gaslit me with "try again" several times. when they've switched to custom captcha, I actually started posting again instead of just lurking.
yeah, the recent 5-15 minute countdown before your first post is a bizarre thing, but I assume the volume of spam and ban-evading schizos they're dealing with is ungodly. a single dedicated shithead can shit up a general or a slow board indefinitely by just resetting their router or switching airplane mode on/off for a few minutes when they get banned.
>but now 4chan is one of the most user-hostile social media platforms around.
virtually every single big platform requires your phone number.
Same here. the captcha is the tip of the iceberg. VPNs , proxies...all blocked. Tons of ghosting and censoring of posts too. Also crawling with feds and people trying to get you to incriminate yourself. I love the option to bypass it with crypto. Yeah, like I am going to give them btc, which will be traced by every agency and coin analysis firm and also get my wallet/exchange account restricted by being linked to 4chan. The owners more than happy to comply with every 3-letter agency request for info.
I don't get why they added that nasty "feature" to the post form, it really discourages you to post(maybe it's because they want to sell you their 4chan pass), I don't understand why 4chan is still active
It's not like bots aren't already bypassing these CAPTCHAs. One author writing a blog post about how they accomplished what spammers and bots have been doing for ages isn't going to change anything.
I just opened 4chan and after the initial Cloudflare bot detection I was told to register an email or wait 15 minutes before I was allowed to even obtain a CAPTCHA. Looks like they're already taking a layered approach to combat bots.
It only took about three days until the very first captcha solver was made back in 2021, and the dev's only response was to blanket ban the author's name sitewide until he became popular again for other reasons so they had to remove the filter. They know it's only a matter of time for someone to train a new model no matter how much they update the captcha so they don't really care much about it these days.
If there's one place on the web I would apply anonymity with great diligence, it would be posting any article that might put me at odds with the good people of 4Chan.
I suspect really strongly that the available characters in the 4chan captcha were chose to be able to spell out the most racist/nazi/extreme slurs and slogans imaginable. For instance, not all numerals are ever used, but 1, 4, and 8 are. K is often there, and whatever the algo is, pseudorandom or not, it often doubles/triples characters. I've personally seen "kkk" twice over the years. Mind you, it does seem random. But even randomly, these must happen often enough to set that crowd off, they make a game of posting a screenshot of the "good ones".
All the worst slurs I can think of in my limited vocabulary can't even be spelled with the characters available. I suspect the opposite - they might have been chosen to avoid spelling things like that.
4chan was gaming the previous captchas for awhile to label some of the data with racial slurs, as they had discovered the threshold that you’re allowed to be wrong by, and were aggressively abusing it.
Some comments were deferred for faster rendering.
cherryteastain|1 year ago
Actually, I'll extend that to saying every open source Google library/tool feels like that.
alecco|1 year ago
https://news.ycombinator.com/item?id=42130881 on Francois Chollet is leaving Google
> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.
Retr0id|1 year ago
Dachande663|1 year ago
[0] https://www.hybridlogic.co.uk/contact
vunderba|1 year ago
https://vivirenremoto.github.io/doomcaptcha/
winrid|1 year ago
chamomeal|1 year ago
bawolff|1 year ago
https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting
However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract
However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.
noprocrasted|1 year ago
There's a significant amount of time, skill and effort that went into the solution from this post, and the end result doesn't generalize well (you'd have to start all over for a different kind of captcha).
The vast majority of spammers would not be able to replicate this; those who do would either make money legitimately, or focus their skills on juicier targets (if you have AI/ML skills and want to do nefarious things there are other options that pay much better than spamming).
Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.
brian-armstrong|1 year ago
3abiton|1 year ago
poincaredisk|1 year ago
RobotToaster|1 year ago
At that point a proof of work captcha (mCaptcha.org is one, but there are others), is probably the best option. Especially with how any reasonably effective traditional captcha is an accessibility nightmare.
nyclounge|1 year ago
mieko|1 year ago
mbs159|1 year ago
antirez|1 year ago
codetrotter|1 year ago
Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.
https://4chan.org/pass
This is already available to not have CAPTCHA. So if CAPTCHA is totally ineffective, it follows that they should do away with CAPTCHA and free users being able to post at all and everyone should buy the 4chan Pass if they want to post.
YeahThisIsMe|1 year ago
hackernewds|1 year ago
gosub100|1 year ago
encom|1 year ago
hsbauauvhabzb|1 year ago
brodo|1 year ago
somat|1 year ago
If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.
Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.
wraptile|1 year ago
Even before captcha is being served your TLS is first fingerprinted, then your IP, then your HTTP2, then your request, then your javascript environment (including font and image rendering capabilities) and browser itself. These are used to calculate a trust score which determines whether captcha will be served at all. Only then it makes sense to analyze captcha's input but by that time you caught 90% of bots either way.
The amount your browser can tell about you to any server without your awareness is insane to the point where every single one us probably has a more unique digital fingerprint than our very own physical fingerprint!
kccqzy|1 year ago
benreesman|1 year ago
chiph|1 year ago
Pikamander2|1 year ago
> TensorFlow.js doesn't support Keras 3.
I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.
sigmoid10|1 year ago
ChrisMarshallNY|1 year ago
blackjackfoe|1 year ago
gherkinnn|1 year ago
salawat|1 year ago
morkalork|1 year ago
tumsfestival|1 year ago
rany_|1 year ago
OmarShehata|1 year ago
[1] https://xkcd.com/810/
cchance|1 year ago
makifoxgirl|1 year ago
Alifatisk|1 year ago
ranger_danger|1 year ago
https://addons.mozilla.org/en-US/firefox/addon/jkcs/
https://chromewebstore.google.com/detail/joshi-koukousei-cap...
Userscript version: https://github.com/drunohazarb/4chan-captcha-solver
blackjackfoe|1 year ago
Yeul|1 year ago
hobom|1 year ago
blackjackfoe|1 year ago
ipnon|1 year ago
kalleboo|1 year ago
chad1n|1 year ago
blackjackfoe|1 year ago
It was definitely a successful learning exercise, and it's made me more confident tackling some other problems I've had in mind for awhile.
bryan0|1 year ago
kattagarian|1 year ago
morkalork|1 year ago
smithcoin|1 year ago
BrandonY|1 year ago
trallnag|1 year ago
m3kw9|1 year ago
asynchronous|1 year ago
nullpt_rs|1 year ago
2Gkashmiri|1 year ago
unit149|1 year ago
If the JSON file is corrupt, it shows the following if tt1 and cd do not align.
> "error": "You have to wait a while before doing this again"
lofenfew|1 year ago
chatmasta|1 year ago
blackjackfoe|1 year ago
cchance|1 year ago
anigbrowl|1 year ago
paulpauper|1 year ago
axpy906|1 year ago
b8|1 year ago
thrance|1 year ago
mgaunard|1 year ago
chistev|1 year ago
crazy
cubefox|1 year ago
saagarjha|1 year ago
matrix87|1 year ago
bhasi|1 year ago
nfRfqX5n|1 year ago
dmitrygr|1 year ago
orhmeh09|1 year ago
tomxor|1 year ago
[edit]
More specifically I mean when they insidiously give you infinite tests even though it's impossible to pass because the IP has been blacklisted... There's a special place in hell for the anti-human's that made that decision, and yes it involves captcha.
blackjackfoe|1 year ago
fresh_broccoli|1 year ago
Anonymous boards were supposed to be low-friction, but now 4chan is one of the most user-hostile social media platforms around. It takes a special kind of dedication to post there, which I seriously doubt helps the quality of the site.
alekratz|1 year ago
blackjackfoe|1 year ago
The worst part about the countdown: if you wait too long to make a post after waiting the 10 minutes (eg: you get distracted,) it will expire, and you have to wait another 10 minutes.
scrlk|1 year ago
When an earlier version was trialled on /biz/ (mandatory email verification - https://warosu.org/biz/thread/58388587), it nuked the board and it hasn't recovered.
shortrounddev2|1 year ago
123yawaworht456|1 year ago
yeah, the recent 5-15 minute countdown before your first post is a bizarre thing, but I assume the volume of spam and ban-evading schizos they're dealing with is ungodly. a single dedicated shithead can shit up a general or a slow board indefinitely by just resetting their router or switching airplane mode on/off for a few minutes when they get banned.
>but now 4chan is one of the most user-hostile social media platforms around.
virtually every single big platform requires your phone number.
paulpauper|1 year ago
jimbob45|1 year ago
Stay off /v/, /tv/, /pol/, and /a/ and you’ll have a pretty good time.
unknown|1 year ago
[deleted]
prettywoman|1 year ago
I don't get why they added that nasty "feature" to the post form, it really discourages you to post(maybe it's because they want to sell you their 4chan pass), I don't understand why 4chan is still active
unknown|1 year ago
[deleted]
mitchelleraya73|1 year ago
[deleted]
unknown|1 year ago
[deleted]
Gonbet|1 year ago
[deleted]
bigbacaloa|1 year ago
[deleted]
honestqn|1 year ago
[deleted]
arrakark|1 year ago
https://news.ycombinator.com/newsguidelines.html
brodo|1 year ago
[deleted]
ValentinA23|1 year ago
[deleted]
anigbrowl|1 year ago
jeroenhd|1 year ago
I just opened 4chan and after the initial Cloudflare bot detection I was told to register an email or wait 15 minutes before I was allowed to even obtain a CAPTCHA. Looks like they're already taking a layered approach to combat bots.
credus|1 year ago
sunaookami|1 year ago
unknown|1 year ago
[deleted]
tomcam|1 year ago
mostly kidding! mostly
blackjackfoe|1 year ago
NoMoreNicksLeft|1 year ago
blackjackfoe|1 year ago
Der_Einzige|1 year ago
BriggyDwiggs42|1 year ago