@31337

31337@sh.itjust.works · edit-2 9 months ago

The Stable Diffusion algorithm is strange, and I’m surprised someone thought of it, and surprised it works.

IIRC it works like this: Stable Diffusion starts with an image of completely random noise. The idea is that the text prompt given to the model describes a hypothetical image where the noise was added. So, the model tries to “predict,” given the text, what the image would look like if it was denoised a little bit. It does this repeatedly until the image is fully denoised.

So, it’s very easy for the algorithm to make a “mistake” in one iteration by coloring the wrong pixels black. It’s unable to correct it’s mistake in later denoising iterations, and just fills in the pixels around it with what it thinks looks plausible. And, it can’t really “plan” ahead of time, it can only do one denoising operation at a time.

31337@sh.itjust.works · 10 months ago

Nah, Java is alright. All the old complicated “enterprise” community and frameworks gave it a bad reputation. It was designed to be an easier, less bloated C++ (in terms of features/programming paradigms). It’s also executed fairly efficiently. Last time I checked, the same program written in C would typically take 2x the time to complete in Java; whereas it would take 200x the time to complete in Python. Here’s some recent benchmarks: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/python3-java.html

I haven’t had a chance to try Rust yet, but want to. Interestingly, Rust scores poorly on source-code complexity: https://benchmarksgame-team.pages.debian.net/benchmarksgame/how-programs-are-measured.html#source-code

31337@sh.itjust.works · 11 months ago

I haven’t tried using Macs. I’ve heard their GPUs are kinda slow (compared to high-end discrete GPU), but have unified memory so you can run very large models.

I bought 3090s because I needed to train a classifier. It took months of training 24/7, so it was cheaper to buy 3090s than pay for cloud compute time. A 3090 is probably overkill for just running SDXL (unless they release an even larger model in the future).

31337@sh.itjust.works · 11 months ago

I use ChatGPT premium almost every day, mostly for coding, rarely for image generation. $20/month. It can write/refactor decent (not great) code faster than me if I can type out what I want faster than just writing the code myself. Dalle-3 through ChatGPT produces pretty good images and seems to understand the prompts better than SD (ChatGPT actually writes the prompt for you, so that might have something to do with it). It’s much better than Dalle-2, but they’ve put guardrails on it so you can’t ask to do things like create images in the style of a modern artist.

I’ve messed around with Automatic1111 and SD a little bit. ControlNet is very nice for when you need to have control over the output. I would draw shitty outlines with Inkscape then used SD+ControlNet to kind of fill everything else in. Free and open source model and software. Ran it on a RTX 3090 which costed me $800 a year ago.

Messed around with DeepFloyd IF on replicate.ai for a while, which was very nice. It seemed to understand the prompts much better than SD. I think it was $2/hr, with each image generation using something like 30s of GPU time. Cold starts can take minutes though, which is annoying.

I use OpenAI’s API in a prototype application; both GPT-4 and Dalle-3. GPT-4 is by far the most well-behaved and “knowledgeable” LLM, but all the guardrails put on it can be annoying. Dalle-3 is pretty good, but not sure if it’s the best. The cost isn’t significant yet while prototyping.

I get ads, news, and video recommendations served to me which probably uses some kind multi-armed bandit AI algorithm. Costs me my privacy. I don’t like it; I rate it 0/10.

31337@sh.itjust.works · 1 year ago

Hmm. Looks like SD associates 31337 with people holding fruits or vegetables. Perhaps there were some images in its training set with 31337 in their filename.

Bing image creator associates it with cyberpunk/vaporwave aesthetics, which is more correct. Bing appears to have a better and larger language model behind it that allows for better associations, and probably a larger and cleaner training dataset.

Just skimmed through the Dalle-3 paper, and yeah, it’s probably the result of the better training data that was generated by GPT-V recaptioning images.

31337@sh.itjust.works · 1 year ago

Interesting to see the biases of different models. 31337 Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 713069833, Size: 1024x1024, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Refiner: sd_xl_refiner_1.0 [7440042bbd], Refiner switch at: 0.8, Version: v1.6.0

Bing Image Creator