Fun with Fairy Kei uniforms [Midjourney]

Thelsim@sh.itjust.works · 1 year ago

Fun with Fairy Kei uniforms [Midjourney]

tal@lemmy.today · edit-2 1 year ago

That first firearm looks better than I normally get in Stable Diffusion, but it’s still a little odd – like, there’s no trigger, for example, and it’s being held as if there’s a pistol grip, but no pistol grip is coming out the bottom of her hand.

One thing that current SD models I’ve played with don’t do well is firearms. I’ve tried dealing with that by specifying particular models of firearms, but same thing. They tend to produce images that combine parts from unrelated firearms that don’t make a lot of sense together. Sometimes pieces are backwards, sometimes they’re in bizarre places.

Lemme do a few montages to demonstrate:

With just “rifle”, I get something that looks a lot like an M-14:

View Full Size

But it’s a mess of multiple magazines, scopes facing backwards, multiple triggers, bipods with missing legs, stocks on both ends, rifles mounted as scopes on other rifles, etc.

I thought that trying to specify a precise firearm model might avoid the problem. A Remington 870 is an extremely-common firearm; there should be a lot of images of it, so hopefully there’s enough training data to do something reasonable with it alone. But it’s still pretty much a mess with “remington 870 shotgun”:

View Full Size

I don’t think that the issue is an inadequate training set size, because there’s plenty of variety in the images. I think that the problem is that there are certain things that the generic algorithms that the LLMs are currently using don’t do terribly well with certain things that humans are particularly-sensitive to looking wrong on. To the LLM, certain things that look very similar look very different to us. Fingers and toes are a famous example. In many images, there’s nothing wrong with adding a few more of something. Have a cornfield, and whether there are five or six rows of similar corn doesn’t matter much. But with a human hand, we care a lot about whether there are five or six fingers.

Same thing with firearms. Lots of kind of similar-looking portions of objects, but some of them go together in ways that we just don’t like.

Maybe LLMs could incorporate some kind of training on “bad” images, things that are undesirable, and we could flag images with too-many fingers as undesirable.

Problem is, that right now they can generally assume that images out there are good, and nobody wants to manually create a “bad” training corpus, and it’d be a huge amount of work.

Early on, search engines tried figuring out whether their given search results were good by asking users. Users generally didn’t care about spending time to rank search engine results, but IIRC Google realized that one could probably infer some information about whether a result was good or not if a user stopped searching for the thing after they found it. Maybe there’s some way to infer similar information from public LLM services like Midjourney or DALL-E. If so, that could maybe be used to cheaply build a “bad” corpus.

Thelsim@sh.itjust.works · 1 year ago

Oh these were surprisingly accurate yes. But usually I get the same kind of weird results, droopy gun syndrome being one of my favorites :)

I know Midjourney has a rating system on their website and an incentive for you to vote. But I’m not sure what they actually do with that information.

KeenFlame@feddit.nu · 1 year ago

It’s the same with many different types of weapons. It’s just how it looks when not trained enough on a subject.

tal@lemmy.today · edit-2 1 year ago

I’d also add that it’s not, I think, just a matter of learning that rifles never have two stocks facing in opposite directions in real life by throwing more training data of good rifles at it. I mean, I recall a very beautiful AI-generated image of a slope of a green hill that was merging into an ocean wave. It was very aesthetically-pleasing. But…it’s not something that would ever happen in real life, or could make sense. That’s the same as with the reverse stocks on a rifle. Yet we like the hill-wave, but dislike the reverse firearm stocks. It’s not clear to me whether there’s a great set of existing information out there that would let a generative AI distinguish between the two classes of image.

It is one area where human artists do well – they can use their own aesthetic sense to have a feel for what looks attractive, use that as a baseline. That’s not perfect – what the artist likes, a particular viewer might not like. But it’s a pretty good starting place. A generative AI has to be able to create new images, but without having an easy sense for what combinations might be unattractive.

I think that one of the interesting things with generative AIs is going to be not just finding what they do well – and they do some things astoundingly (to me) well, like imitating an artist’s style or combining wildly-disparate images in interesting ways. It’s going to be figuring out a number of things that we think are easy that are actually really hard.

I’m not sure whether making a rifle is going to be one of those – maybe there’s a great way to do that. But there are gonna be some things that are gonna be hard for LLMs.

At that point, I think that we’re either gonna have to just figure out new ways of solving some of those problems – like, people hardcoded “fixes” for faces into Stable Diffusion back in the pre-XL era, as faces and especially eyes often looked a bit off. Maybe we need to move to systems that have a 3D representation of the images. Or maybe we introduce software that tends to permit for human interaction, to provide for human-assisted decisions in areas that are hard.

paddirn@lemmy.world · 1 year ago

Thought this was a real cosplay at first and saw the gun without anything to make it look fake and thought, “That’s really not safe, someone might think she’s holding a real gun.”

Thelsim@sh.itjust.works · 1 year ago

To be honest, I’m disappointed the guns are always black. I would have loved a pink or blue gun to complete the look and make it less serious.

But unless you specifically say so otherwise, guns will always be black. It’s a very strong style element in the model I guess.

isthingoneventhis@lemmy.world · 1 year ago

The weird fleshtube finger, finger thumb, or the proportionally child sized face didn’t raise any alarms first?

mnemonicmonkeys@sh.itjust.works · 1 year ago

At least she has good trigger discipline

Usernameblankface@lemmy.world · 1 year ago

What an excellent combination!

Thelsim@sh.itjust.works · 1 year ago

Thank you!

DominusOfMegadeus@sh.itjust.works · 1 year ago

The face is oddly disturbing…

Thelsim@sh.itjust.works · 1 year ago

I know what you mean, it’s a little too small and too fake looking.
I picked the image for the ridiculous amount of detail in the uniform. The face was something I just had to accept.

tal@lemmy.today · edit-2 1 year ago

it’s a little too small and too fake looking

SD does the same, but even more extreme. Face looks like a doll. I’m pretty sure that its training set for “fairy kei” is too limited. When I do a batch of 20 renders of just the term “fairy kei”, they all converge on extremely-similar, 3d-render-looking images, which probably means that it’s short on training data; whatever it was trained on associated with “fairy kei” appears to have been only rendered images, rather than a human cosplaying in this Fairy Kei style.

plays around for a bit

Try putting the name of real people in for which photographic facial data will exist and have been seen. For me, that drags in the facial data.

“Fairy Kei”

versus “Fairy Kei, brad pitt”

Reducing the weighting on ol’ Brad there, so he doesn’t overwhelm the image: “Fairy Kei, (brad pitt:.1)”

Maybe a couple different names, each at weak weight, if you don’t want the output to look like any one real person. I don’t know whether Midjourney filters out images of celebrities from the training set – I remember reading something about how that was one issue that people were after to try to discourage deepfakes – so I dunno if celebs are the ideal people to use there.

For SD, prompt weighting is done with parens, colon, and number, like (foobar:.5). It looks like with Midjourney, it’s double colons, like foobar::.5.

Thelsim@sh.itjust.works · 1 year ago

Interesting to see it’s the same on other models, thank you for sharing!
I prefer to avoid double colons in Midjourney, it always seems to make the scene look disjointed. Like two different people working on the same image. I’ve never managed to get really good results from using them, but maybe I’m doing it wrong?
I think I’ve found a solution to reduce the fakeness level, what this need is a more realistic medium! I’ve added a camera style and switched to the “raw” style (without Midjourney’s special sauce). The uniforms become less outrageous, but the realism is turned up:

Prompt:Fairy Kei fashion, police officer, serious expression, cinematic shot, patrolling in a street, photograph, Fujifilm Superia --ar 3:4 --style raw --s 150
Of course you can increase the style, but then the faces become less real:

Prompt: Fairy Kei fashion, police officer, serious expression, cinematic shot, patrolling in a street, photograph, Fujifilm Superia --ar 3:4 --style raw --s 600

I guess you’ll have to find a balance between the two. Though the “raw” style and the camera type do make a difference even if you do still get that Instagram filter effect :)

tal@lemmy.today · 1 year ago

I prefer to avoid double colons in Midjourney, it always seems to make the scene look disjointed. Like two different people working on the same image. I’ve never managed to get really good results from using them, but maybe I’m doing it wrong?

looks at Midjourney docs more-carefully

Hmm.

Yeah, it looks like using the double colons does alter the Midjourney prompt differently than the weight does SD; the weighting feature doesn’t appear to work the same way. It looks like on Midjourney weights also causes the prompt to be split into different portions.

Aight, sorry; was just skimming the Midjourney syntax docs for the first time.

Thelsim@sh.itjust.works · 1 year ago

That’s ok, I appreciate the effort you put into this. And we both learned something from it.

Helix 🧬@feddit.de · 1 year ago

Reminds me of the movie Dredd.

Thelsim@sh.itjust.works · 1 year ago

If only! I’d love to see a fairy kei remake :)

merde alors@sh.itjust.works · 1 year ago

a world where soldiers in cities are accustomed to, is a sad world.

arms dealers and makers are the scum of this world

tal@lemmy.today · edit-2 1 year ago

considers

I would bet that there have probably been soldiers in cities for pretty much as long as there have been cities.

googles

https://en.wikipedia.org/wiki/Civilization

A civilization (British English: civilisation) is any complex society characterized by the development of the state, social stratification, urbanization, and symbolic systems of communication beyond natural spoken language (namely, a writing system).

This “urban revolution” –a term introduced by Childe in the 1930s– from the 4th millennium BCE, marked the beginning of the accumulation of transferable economic surpluses, which helped economies and cities develop. Urban revolutions were associated with the state monopoly of violence, the appearance of a warrior, or soldier, class and endemic warfare (a state of continual or frequent warfare), the rapid development of hierarchies, and the use of human sacrifice.

Yeah, that sounds pretty closely-linked.

I don’t know the history of the legal separation of the soldier and policeman, but I’d bet that it’s pretty recent, and it’s certainly not universal even today.

In the US, there is a partial legal separation – the Posse Comitatus Act. That was relatively recent --1870s, and is not a constitutional-level restriction – it was put into place after the American Civil War by pro-slavery states who disliked the (antislavery) federal government intervening in their society in the Jim Crow era. And it had more to do with disagreement over federal-state roles than fundamental disagreements over military-police.

In France, in contrast, an important chunk of law enforcement is a responsibility of the military today.

merde alors@sh.itjust.works · 1 year ago

in theory you’re right.

in practice, and in France, there was a clear decision to introduce military presence in cities through patrols after bataclan attack. Armed (wo)men in camouflage wasn’t something you saw in cities.

Though gendarmerie is a part of the army, they don’t have the warrior outlook. They look more like policemen.

It was a cunning decision to reinforce military presence in cities to emphasize “we’re in war” propaganda, just for political gains :/

Thelsim@sh.itjust.works · 1 year ago

I’m sorry this light hearted post triggered you in such a manner. It’s not my intention to normalize soldiers in the street or anything like that. Body armor just added extra elements for the AI to play with. It also seemed to automatically come with a gun. Also, that soldier just looks gorgeous in her skirt and camo leggings :)

Anyway, here’s two more without any guns, just for you.

Police officer

Paramedic

No hard feelings? :)

merde alors@sh.itjust.works · 1 year ago

no hard feelings of course