JieJie's comments

JieJie | 1 year ago | on: Adobe is buying videos for $3 per minute to build AI model

Alternate non-paywalled link from MSN^0:

"The software company is offering its network of photographers and artists $120 to submit videos of people engaged in everyday actions such as walking or expressing emotions including joy and anger, according to documents seen by Bloomberg. The goal is to source assets for artificial intelligence training, the company wrote."

0: https://www.msn.com/en-us/money/other/adobe-is-buying-videos...

JieJie | 1 year ago | on: Scaling will never get us to AGI

Marcus has a good point when he says that the scaling will only benefit in a logarithmic way, not in a linear way. I watched the Sora video as well and in the Q&A session, they admit that they will need a lot of data. Marcus' contention is that there isn't enough data on the internet and synthetic data will just result in errors.

Where Marcus gets it wrong is that he defines "right" as producing an algorithm that an AI will follow to get a deterministic result. So, every time Sora 2 (Gary's Version) produces a video of a glass shattering, it is the same shatter pattern being produced; and it must be a precise duplicate of a glass shattering in nature. That's Marcus' win situation, which Sora will unlikely ever reach.

Maybe transformer-based AI will never be capable of perfectly simulating reality in order to unlock its secrets. It seems to me that, according to the transformer-denialists, in order to create an AI that understands reality, we must fully understand reality first and then program the AI with that understanding.

In my mind, I imagine neural networks as drift-car drivers (I think of them as Ken_Block, Paul Walker would also be acceptable). Sure, your average drift-car driver has no idea how to solve a three-body equation algorithmically, but a great drift car driver can maneuver four spinning tires in a state of critical oversteer around a race track curve without the use of a calculator and get it right most every time.

And yes, race-car drivers have short lifespans, it's true. That's what terrifies Marcus so much about neural networks as well, and why he is so adamant that we listen to him when he says that there are dangerous curves ahead.

I personally would rather live in a world where there are Ken Blocks and Paul Walkers, and that's how I live my life (not in auto-racing, though) but I understand why that frightens people.

JieJie | 1 year ago | on: Schedule-Free Learning – A New Way to Train

From the Related Work section (best guess):

Stochastic Weight Averaging (Izmailov et al 2018) https://arxiv.org/abs/1803.05407

Latest Weight Averaging (Kaddour 2022) https://arxiv.org/abs/2209.14981

Latest Weight Averaging? (Sanyal et al 2023) https://arxiv.org/abs/2311.16294

Cyclic Learning Rates (Portes et al 2022) https://arxiv.org/abs/2206.00832

Exponential Moving Average? (Zhanghan? et al 2019) https://arxiv.org/abs/1909.01804

JieJie | 1 year ago | on: Understanding and managing the impact of machine learning models on the web

You are correct to say that the distinction between fine and commercial art is beyond an artwork's practical utility, but I think we could both agree that the market price of art does not necessarily equate to its overall social value. That is what I was getting at. It's more about economic markets not capturing social value, and copyright's role in protecting economic value to the detriment of social value.

Outside the scope of my comment is whether copyright is even capable of protecting social value (I tend to think it isn't), but if it is, W3C should be the organization that steps up to make the attempt.

JieJie | 1 year ago | on: AI and the Problem of Knowledge Collapse

The discussion section is quite illuminating.

"While much recent attention has been on the problem of LLMs misleadingly presenting fiction as fact (hallucination), this may be less of an issue than the problem of representativeness across a distribution of possible responses. Hallucination of verifiable, concrete facts is often easy to correct for. Yet many real world questions do not have well-defined, verifiably true and false answers. If a user asks, for example, “What causes inflation?” and a LLM answers “monetary policy”, the problem isn’t one of hallucination, but of the failure to reflect the full-distribution of possible answers to the question, or at least provide an overview of the main schools of economic thought."

JieJie | 1 year ago | on: Understanding and managing the impact of machine learning models on the web

I hope your comment doesn't get downvoted too heavily, because I think you raise good points.

What seems to be happening, and is happening in this document by W3C as well, is that the social value of information and the economic value of information are being conflated. Social media has created markets for creative works where these two values become entangled.

Another way to say this is that commercial art and fine art are different things, but they are treated the same by the web and perhaps they shouldn't.

When someone creates fine art, they are not creating art for the sake of its economic value. They are creating a work of art for its social value, and want it distributed as widely as possible.

When someone creates commercial art, they are creating art specifically for its economic value. That value may be enhanced by wider distribution, but it may also be diluted by wider distribution.

Because these two types of art need to be treated differently by the web, we can't have one solution that benefits both kinds of art.

We need both copyright to protect commercial artworks, but we need a system that encourages wide distribution of the collective information of humanity, allowing equal weight to everyone's ideas, outside of their economic value.

i.e. Kafka's ideas are more valuable to humanity than Beavis and Butthead.

This W3C draft doesn't take that into account. It needs to. We need to think beyond the needs of artists who rely on social media to ply their wares, while also taking into account their needs. We should not, however, codify the social media influencer art market, because that is not a market worth protecting. It's an aberration that encourages people to share personal information and works that benefit the platforms and harms society. We want to build something that benefits society and the individuals who contribute artworks that benefit society today and into the future. And if you can find a way to make a buck in the middle there somewhere, we should encourage that, too.

JieJie | 2 years ago | on: How to be a good listener

It's not so much the folks who suck it up and carry on, though they might be doing themselves a long-term disservice; it's the ones who project it outwards that are doing most of the harm in the world. Boys and girls, of course.

JieJie | 2 years ago | on: We Can't Have Serious Discussions About Section 230 If People Misrepresent It

Precedent in the US is very much on the side of the person, not the method.

Whether the Xerox photocopier, the Sony Betamax or (so far) generative AI, judges are ruling that the fault in using technology to infringe copyright doesn't lie in the technology, but the way the technology is used by a person.

Generative AI isn't (yet) a person, so it is neither capable of creating copyrightable works, nor in itself infringing copyright.

I wrote a research paper on it, if you're interested.

https://www.zipbangwow.com/intellectual-impropriety/

page 2