Fun with Fairy Kei uniforms [Midjourney]

Thelsim@sh.itjust.works · 1 year ago

Fun with Fairy Kei uniforms [Midjourney]

tal@lemmy.today · edit-2 1 year ago

I’d also add that it’s not, I think, just a matter of learning that rifles never have two stocks facing in opposite directions in real life by throwing more training data of good rifles at it. I mean, I recall a very beautiful AI-generated image of a slope of a green hill that was merging into an ocean wave. It was very aesthetically-pleasing. But…it’s not something that would ever happen in real life, or could make sense. That’s the same as with the reverse stocks on a rifle. Yet we like the hill-wave, but dislike the reverse firearm stocks. It’s not clear to me whether there’s a great set of existing information out there that would let a generative AI distinguish between the two classes of image.

It is one area where human artists do well – they can use their own aesthetic sense to have a feel for what looks attractive, use that as a baseline. That’s not perfect – what the artist likes, a particular viewer might not like. But it’s a pretty good starting place. A generative AI has to be able to create new images, but without having an easy sense for what combinations might be unattractive.

I think that one of the interesting things with generative AIs is going to be not just finding what they do well – and they do some things astoundingly (to me) well, like imitating an artist’s style or combining wildly-disparate images in interesting ways. It’s going to be figuring out a number of things that we think are easy that are actually really hard.

I’m not sure whether making a rifle is going to be one of those – maybe there’s a great way to do that. But there are gonna be some things that are gonna be hard for LLMs.

At that point, I think that we’re either gonna have to just figure out new ways of solving some of those problems – like, people hardcoded “fixes” for faces into Stable Diffusion back in the pre-XL era, as faces and especially eyes often looked a bit off. Maybe we need to move to systems that have a 3D representation of the images. Or maybe we introduce software that tends to permit for human interaction, to provide for human-assisted decisions in areas that are hard.