top of page

Simple Steps to Create Videos with Multiple Characters in ComfyUI on Low VRAM

Apr 30, 2025

4 min read

0

183

0

midjourney blog post image
A Midjourney generated image using Midjourney Automation Suite

Want to make videos that feature more than just one person? Using ComfyUI and special models, you can create video clips that include several characters based on reference images. This guide walks you through the process, even if you don't have a top-tier graphics card.

Keep in mind that when using photos of people, especially for anything public or commercial, you must have permission. This guide uses examples of well-known individuals for educational purposes only to show how the technology works.

Getting Started with the Right Tools

To create videos with multiple characters, you need specific tools and models for ComfyUI.

Key Models

  • Phantom Van: This is a core model. Look for the FP32 safe tensors version. It's about 5.68 GB.

  • UMT5 XL Encoder: You'll use this for the text encoder. Make sure you get the BF16 version. This specific tool does not support the FP8 version, even though it's smaller.

  • Standard VAE: You'll also need a regular VAE model for VAN.

Installing the Necessary Custom Nodes

The specific nodes needed for this workflow might not be in the easy installer within ComfyUI Manager right now. You'll need to install them manually.

Here’s how:

  • Find the "Comfy Van video wrapper" link (often provided with the workflow download).

  • Copy the repository URL.

  • Go to your ComfyUI folder. Inside it, find the `custom_nodes` folder.

  • Open a command prompt or terminal within the `custom_nodes` folder.

  • Use the `git clone` command followed by the URL you copied.

  • Restart ComfyUI and the new nodes should appear.

Setting Up Your Workflow

The workflow involves connecting several nodes, but the setup for subjects is key.

Handling Multiple Subjects

The great thing about this setup is you can include input images for multiple characters. While the system can technically handle up to four, you get the best and most stable results with one or two subjects.

For each subject, the process involves:

  • Loading an image of the person.

  • Sending the image through a "remove background" node. This isn't about making the image transparent. It helps the model focus better on the subject. Set transparency to FALSE.

  • Using a background color like white after removing the original background.

  • Resizing the image to 512 pixels. Use "fill and crop" instead of stretching. Stretching can distort facial features. Cropping is fine even if it cuts off parts of the body, as the model can generate the rest.

  • Connecting the processed subject images to the Phantom embeds as "phantom latents."

Managing detailed workflows with multiple character inputs can be complex. For a simpler way to handle and automate advanced setups like this in Midjourney, consider exploring the Midjourney Automation Suite from TitanXT. It can streamline complex image and video generation tasks.

Workflow Settings and Prompts

You'll link the necessary models (VAE, text encoder, main model) into the workflow. Some settings are important for performance and results.

General Settings

Set the attention mode to "sdpa". This works well for most systems. If you encounter errors with other settings, sdpa is a good default.

There's a node for video tcash which can help speed up the process slightly. You might adjust its threshold based on your VRAM and how fast generations are.

Prompts - Telling the AI What to Do

You'll write a positive prompt describing the scene and actions, and a negative prompt listing things you don't want.

  • Positive Prompt: Describe the characters, their actions, the setting, and the mood. For example, "a young woman and a young man walking together side by side... moving directly towards the camera along an outdoor path of park. They appear comfortable and connected, sharing subtle smiles or glances." The model is quite good at prompt understanding for up to two subjects.

  • Negative Prompt: List undesirable elements like "garish colors" (overly harsh), "overexposed," "blurry details," "static," "worst quality," "low quality." It's usually best *not* to put anything about crowds, especially with multiple subjects, as it can confuse the model and lead to issues with character consistency and background quality.

Sampling and Decoding

Here you control how the video is generated.

  • Set steps (e.g., 30), CFG (e.g., 5), and shift (e.g., 5).

  • A fixed seed can be helpful, but you can use any seed.

  • For the scheduler, "uniC" seems to work best for keeping multiple characters consistent compared to others like 'Uler', which might turn one character into the other gender.

  • Keep dino strength at 1.

The video is then decoded. You can set the resolution. Higher resolution like 1280x768 gives better quality but takes longer, especially on lower VRAM. Lower resolutions like 832x480 are faster.

The standard frame rate is 24 frames per second.

Results and What to Expect

Even with lower resolutions, the resemblance to the reference images can be quite good. The model can sometimes capture details from the input photos like clothing type, colors, or accessories like earrings, even if parts of the person weren't fully in the reference image.

The model also does a decent job with movement coherent to the prompt (e.g., hair movement, gliding motion).

You might see some small imperfections like slight flickering around eyes or faces, but overall, for a model of this size and capability, the results with multiple consistent characters are impressive.

Generating multiple video clips or managing many reference images manually can become tedious. Automate your creation process and handle these tasks more efficiently with the Midjourney Automation Suite from TitanXT.

Conclusion

Creating videos with multiple characters in ComfyUI from reference images is possible using the Phantom Van model and the described workflow. While more than two subjects can pose challenges, the setup effectively handles two, maintaining likeness and coherence within the generated video. This is a powerful technique for bringing your creative visions with specific characters to life.

Ready to take your AI art workflow to the next level and handle tasks like this with greater ease? Explore the Midjourney Automation Suite from TitanXT.

Apr 30, 2025

4 min read

0

183

0

Related Posts

Comments

Share Your ThoughtsBe the first to write a comment.
bottom of page
Midjourney Automation Suite - Automate your image generation workflows on Midjourney | Product Hunt