Stable Diffusion models are general text-to-image diffusion models and therefore mirror biases and (mis-)conceptions that are present Upon successful installation, the code will automatically default to memory efficient attentionįor the self- and cross-attention layers in the U-Net and autoencoder. You can update an existing latent diffusion environment by running Stable Diffusion is a latent text-to-image diffusion model. High-Resolution Image Synthesis with Latent Diffusion Models The original Stable Diffusion model was created in a collaboration with CompVis and RunwayML and builds upon the work: We follow the original repository and provide basic inference scripts to sample from the models. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.Ī text-guided inpainting model, finetuned from SD 2.0-base. New depth-guided stable diffusion model, finetuned from SD 2.0-base. The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.Īdded a x4 upscaling latent text-guided diffusion model. SD 2.0-v is a so-called v-prediction model. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. New stable diffusion model ( Stable Diffusion 2.0-v) at 768x768 resolution. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model), run your script with ATTN_PRECISION=fp16 python Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. New stable diffusion model ( Stable Diffusion 2.1-v, Hugging Face) at 768x768 resolution and ( Stable Diffusion 2.1-base, HuggingFace) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the LAION-5B dataset.Instructions are available here.Ī public demo of SD-unCLIP is already available at /stable-diffusion-reimagine Comes in two variants: Stable unCLIP-L and Stable unCLIP-H, which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. This model allows for image variations and mixing operations as described in Hierarchical Text-Conditional Image Generation with CLIP Latents, and, thanks to its modularity, can be combined with other models such as KARLO. New stable diffusion finetune ( Stable unCLIP 2.1, Hugging Face) at 768x768 resolution, based on SD2.1-768. The following list provides an overview of all currently available models. This repository contains Stable Diffusion models trained from scratch and will be continuously updated with
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |