Prompt: Fairy Kei fashion, police, body armor, serious expression, full body shot, patrolling in a street, photography --ar 3:4
I was playing around with the Fairy Kei fashion style. I love all the pink and how well the style blends with the rest of the prompt.
Prompt: Fairy Kei fashion, astronaut, space suit, serious expression, full body shot, in front of a rocket, photography --ar 3:4
Prompt: Fairy Kei fashion, soldier, strong, body armor, serious expression, full body shot, patrolling in a street, photography --ar 3:4
Prompt: Fairy Kei fashion, firefighter, gas mask, helmet, soot, in front of a burning building, full body shot, photography --ar 3:4
Even the smoke is pink! :)
I’d also add that it’s not, I think, just a matter of learning that rifles never have two stocks facing in opposite directions in real life by throwing more training data of good rifles at it. I mean, I recall a very beautiful AI-generated image of a slope of a green hill that was merging into an ocean wave. It was very aesthetically-pleasing. But…it’s not something that would ever happen in real life, or could make sense. That’s the same as with the reverse stocks on a rifle. Yet we like the hill-wave, but dislike the reverse firearm stocks. It’s not clear to me whether there’s a great set of existing information out there that would let a generative AI distinguish between the two classes of image.
It is one area where human artists do well – they can use their own aesthetic sense to have a feel for what looks attractive, use that as a baseline. That’s not perfect – what the artist likes, a particular viewer might not like. But it’s a pretty good starting place. A generative AI has to be able to create new images, but without having an easy sense for what combinations might be unattractive.
I think that one of the interesting things with generative AIs is going to be not just finding what they do well – and they do some things astoundingly (to me) well, like imitating an artist’s style or combining wildly-disparate images in interesting ways. It’s going to be figuring out a number of things that we think are easy that are actually really hard.
I’m not sure whether making a rifle is going to be one of those – maybe there’s a great way to do that. But there are gonna be some things that are gonna be hard for LLMs.
At that point, I think that we’re either gonna have to just figure out new ways of solving some of those problems – like, people hardcoded “fixes” for faces into Stable Diffusion back in the pre-XL era, as faces and especially eyes often looked a bit off. Maybe we need to move to systems that have a 3D representation of the images. Or maybe we introduce software that tends to permit for human interaction, to provide for human-assisted decisions in areas that are hard.