Stable Diffusion 2.0 Announcement

Shows the Silver Award... and that's it.

Thank you stranger. Shows the award.

When you come across a feel-good thing.

Can't stop seeing stars

Boldly go where we haven't been in a long, long time.

























  1. GAN projection, read my comment on why the tweet is disinformation.

  2. Read it ahah, the tweet is bullshit but for a particular reason

  3. I don't think you can just replace the newly trained CLIP with the original CLIP and things will work fine, don't the latent space differ? If that's the case someone can simply freeze all the layers and add an identical trainable layer at the end and just learn to translate new outputs to the original CLIP latent space. I'm sure someone will figure it out soon.

  4. He's not saying to replace the openCLIP model, he's saying to guide the generation with the OpenAI model but the model is still conditioned on openCLIP embeddings (guide != condition)

  5. Appreciate all the work yall have done and sharing it with us!

  6. Where do you have read that they have removed celebs and artists? Because I don't think they really did (it probably just the text encoder, since VQGAN+CLIP we have learned to prompt in a certain way, using some particular artists because they were very much present in the OpenAI dataset, but now with SD2 they used LAION dataset to train the CLIP model)

  7. It is indeed because they switched the datasets which makes sense for them in legal and investors views.

  8. "they have switched the datasets", there is only one dataset thay can use really, LAION dataset because the other one is proprietary, also the CLIP-H model by OpenAI by OpenAI was not released so LAION have trained one, using the only dataset they can use, so it doesn't really makes sense to say "they switch dataset", there is no alternative.

  9. I don't think she is the mind behind SD, she is a researcher but on her own.

  10. She and Rombach are the key (non-executives) from what I understood.

  11. "Robin is now leading the effort with Katherine Crowson at Stability AI to create the next generation of media models with our broader team." Oh nevermind in the SD2 blog post, good for her tbh

  12. Yeah that's how that works, you can search there are not a lot of Greg artworks in the LAION dataset but CLIP knew who he was and by extracting the semantics from the text, Greg was biasing the model to a particular style.

  13. You got any source for that crazy claim? I've literally never, ever seen anyone complain that pictures from SD always look like they're by him.

  14. I think you didn't read my comment properly lol

  15. You probably have, google imagen, google parti, and the chinese one by baidu that came out a couple weeks ago ernie i think it is called. Outputs 1024x1024

  16. Imagen does not use an autoregressive model.

  17. With SD and finetuned model I can create NSFW, this is not possible with MJ, I see this as an absolute win.

  18. to everyone shitting on this infographic: you're wrong. this really isn't remotely far off from observations about representation learning that have been repeatedly demonstrated empirically and influenced the design of e.g. the stylegan latent. Concretely, see for example

  19. People on this subreddit really seem to be affected by the Dunning-Kruger effect. distill.pub has come to my mind too when I saw people in the comment section shitting on OP.

  20. Funny part about this is that the tech for generating waifus has been available for more than 2 years now

  21. Nothing to the level of coherency that the NovelAI model, waifu diffusion and anything v3 can offert. Without the diffusion model architecture or powerful VAE for transformer model, today's SOTA results could not be replicated.

  22. I will look into it, didn't know about before. Is there a reason to not use high denoising settings?

  23. You lose a lot of informations about the original image.

  24. You should try to use the loopback script instead of using such a strong denoising strength

  25. No, theres a section in settings that lets you optout ALL your deviations with a "NOAI" tag. A simple one click button.

  26. They added this feature after a while thanks to all the backslashes lol, it should have been there from the beginning.

  27. If you're new to this field, this is too complex to be explained in one comment, just use the already trained models you can find in this subreddit.

  28. you're looking for trends in China using Google Trends, Google is blocked in all mainland territories.

  29. Yeah I was wondering that myself. Although why are they still on top of the chart? And why does Google just not grey-out China, seems like they extrapolate some sort of data from somewhere. Idk, would love to hear from someone more knowledgeable than me, I'm curious.

  30. Hong kong and Macau have access to Google, for the rest I don't know, maybe there are some institutions who can surf the web with chinese IP without censorship.

  31. Ai art can be beautiful and is super cool but it should be clearly separated from real art, I hope the generation tools start implementing some sort of in invisible watermark like make 200 random Pixels a slightly different shade to allow for them to be labeled as ai art

  32. Stable Diffusion is using a invisible watermark in a lot of implementations.

  33. I don't understand, wasn't the vae already used? In the comment you linked there is just a guide to use the new finetuned VAE model instead of the one that is already embedded in the .ckpt, I guess you already used vae just didn't notice it. (the variational autoencoder is essential to go from the latent 64x64x4 to the 512x512x3 final image and vice versa)

  34. You're right! Thanks for educating me, I'm still getting my head around a lot of the terminology and this is the kind of feedback that really helps.

  35. They finetuned the model by giving more weight to the MSE loss, so that the reconstruction is more faithful than using a high discrimination loss.

  36. Stability recently released a new VAE. The VAE is part of the SD image generation process. A new VAE is exciting since people were hoping it would improve the generated images.

  37. This is a new VAE decoder, not an encoder.

  38. This looks cool, what was your workflow of this? When I tried to use outpainting mk2 on automatic's repo, I could only outpaint very limited pixels per run.

  39. I don't think that outpainting mk2 is using the inpainting capacity of the runway model, it's using it like a standard model, the only thing you can do I think is outpaint manually using the inpaint UI (you need to resize the image and leave some blank space using GIMP/Photoshop etc).

  40. Dalle mini is not the original dalle model from OpenAI

  41. Craiyon is dalle mini, not dalle 1 from OpenAI

  42. Online I tried it already but I want to try locally as well yes. The question is if it is needed to finetune the original model if results are not better.

  43. The results are better and fine-tuning is necessary, because otherwise all the steps would be out of distribution for the unet and as the original model it will ignore most of the original image (Dalle 2 was also finetuned like GLIDE).

  44. I have put several friends in police uniform ahah, you can try on a random person and let me see the results. My results with this model were really good.

  45. This is a fine-tuned model on top of the standard diffusers inpainting pipeline (ie. just masked img2img).

  46. The UNet is different, it takes 5 more channels (mask + original image), so it's quite different than the usual method.

  47. I don't even understand how its working, are people tagging stuff with emojis or is it looking up the Unicode name for the characters or treating it like an image prompt or what the hell?

  48. The model has learned the meaning of the emojis the same way it has learn all the other words in our language (NovelAI model is just a finetuned version of Stable Diffusion).

  49. I wasn't trying to single out china as the main reason. Interest grew across the world. Just so happens especially in China. Inspite of Google ban as you mentioned. But, I think that's a ban on Google, not novelai. Also, it's an official ban, it's not really enforced if your just trying to generate art through ai. But of course, I can only speculate off the data I see here and from what other people say. Maybe they will get banned.

  50. Craiyon was banned and you can really do not much with that model, novelAI model can generate high quality NSFW images, NSFW media are banned in mainland china so yeah they would probably ban the site.

  51. That's an interesting topic. That will probably create an underground community/black market essentially increasing demand/value. Most nsfw art comes from those regions already I believe.

  52. Nah most nsfw art doesn't come from mainland china, there are only a few good nsfw chinese artists, also the nsfw ban in china only increase the probability that chinese artists have fetishes, like feet, loli etc...

  53. What's the reason for making 35 steps the mandatory minimum? The majority of my generations were under 30 steps and I loved the results

  54. The CLIP guidance + the classifier-free guidance are going to create more artifacts so I guess this is the reason

  55. Is this the real dall-e? How are these results so damn good?

  56. It's Stable Diffusion, the cool thing is that the model is completely open source and you can run it even on your PC with a consumer GPU.

  57. Do you need a powerful machine to do it?

  58. At least 4 GB of VRAM or you can run it on Colab, Kaggle etc...

  59. 256x faster than 1000 steps DDPM

  60. Having multiple linear operations without a nonlinear activation function, it's the same as doing a single matrix multiplication+bias, so the example is kinda weird.

  61. By NovelAI's finetuned Stable Diffusion model*

Leave a Reply

Your email address will not be published. Required fields are marked *

News Reporter