Wan Hosting Service | Self-Host Wan-AI T2V, I2V & VACE Models on GPU – B2BHostingClub

Celebrate Christmas and New Year with 25% OFF all services at B2BHostingClub.

The Best GPU Plans for Wan-AI Hosting Service

Choose the appropriate GPU model according to the Bark model size.

Enterprise GPU Dedicated Server - RTX A6000

/mo

  • 256GB RAM
  • GPU: Nvidia Quadro RTX A6000
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Linux / Windows 10/11
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

Enterprise GPU Dedicated Server - A100

/mo

  • 256GB RAM
  • GPU: Nvidia A100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 40GB HBM2
  • FP32 Performance: 19.5 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

/mo

  • 256GB RAM
  • GPU: GeForce RTX 4090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Linux / Windows 10/11
  • Single GPU Specifications:
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS

Enterprise GPU Dedicated Server - RTX 5090

/mo

  • 256GB RAM
  • GPU: GeForce RTX 5090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Microarchitecture: Blackwell 2.0
  • CUDA Cores: 21,760
  • Tensor Cores: 680
  • GPU Memory: 32 GB GDDR7
  • FP32 Performance: 109.7 TFLOPS
  • This is a pre-sale product. Delivery will be completed within 2–10 days after payment.

Multi-GPU Dedicated Server- 2xRTX 5090

/mo

  • 256GB RAM
  • GPU: 2 x GeForce RTX 5090
  • Dual E5-2699v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Microarchitecture: Blackwell 2.0
  • CUDA Cores: 21,760
  • Tensor Cores: 680
  • GPU Memory: 32 GB GDDR7
  • FP32 Performance: 109.7 TFLOPS
  • This is a pre-sale product. Delivery will be completed within 2–10 days after payment.

Enterprise GPU Dedicated Server - H100

/mo

  • 256GB RAM
  • GPU: Nvidia H100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Microarchitecture: Hopper
  • CUDA Cores: 14,592
  • Tensor Cores: 456
  • GPU Memory: 80GB HBM2e
  • FP32 Performance: 183TFLOPS

Multi-GPU Dedicated Server - 3xRTX A6000

/mo

  • 256GB RAM
  • GPU: 3 x Quadro RTX A6000
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

Multi-GPU Dedicated Server - 4xRTX A6000

/mo

  • 512GB RAM
  • GPU: 4 x Quadro RTX A6000
  • Dual 22-Core E5-2699v4
  • 240GB SSD + 4TB NVMe + 16TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

The Best GPU for Wan-AI Models from Hugging Face

To self-host the Wan-AI/Wan2.1-T2V 1.3B or 14B models from Hugging Face, the GPU requirements vary significantly depending on the version of the model you choose and your latency expectations. Below is a GPU recommendation:

Model Name
Size (4-bit Quantization)
Recommended GPUs
Wan-AI/Wan2.1-T2V-1.3B 17.5 GB RTX4090 < A100-40gb < RTX5090
Wan-AI/Wan2.1-VACE-1.3B 19.05GB RTX4090 < A100-40gb < RTX5090
Wan-AI/Wan2.1-T2V-1.3B-Diffusers 19.05GB RTX4090 < A100-40gb < RTX5090
Wan-AI/Wan2.1-T2V-14B 69.06GB 2*A6000 < A100-80GB < H100
Wan-AI/Wan2.1-VACE-14B 75.16GB 2*A6000 < A100-80GB < H100
Wan-AI/Wan2.1-I2V-14B-720P 82.25GB 2*A6000 < 2*A100-80GB < 2*H100
Wan-AI/Wan2.1-I2V-14B-480P 82.25GB 2*A6000 < 2*A100-80GB < 2*H100
Wan-AI/Wan2.1-VACE-14B-diffusers 82.25GB 2*A6000 < 2*A100-80GB < 2*H100

Features of Wan-AI Hosting Service

Multimodal AI Support

Host advanced Text-to-Video (T2V), Image-to-Video (I2V), and Video Auto-Captioning & Editing (VACE) models with support for 1.3B and 14B parameter sizes.

High-Resolution Video Generation

Generate videos in 480p or 720p, with future expandability for higher resolutions depending on your GPU power.

Flexible Deployment Options

Supports PyTorch checkpoints and Hugging Face Diffusers format, giving you freedom to integrate with tools like ComfyUI, AUTOMATIC1111, or custom inference pipelines.

GPU Acceleration Ready

Optimized for A100, H100, RTX 4090, and similar GPUs—ideal for real-time or batch generation workloads.

Offline & Private Deployment

Self-hosted Wan-AI models give you full control of prompts, outputs, and API integrations, ensuring data privacy and independence from third-party servers.

Fine-Tuning & Extension Ready

Advanced users can fine-tune, extend, or chain outputs with other generative tools like LoRA, ControlNet, or video editing frameworks.

Frequently asked questions

Wan-AI Service is to the self-hosted deployment of Wan-AI's generative models — including text-to-video (T2V), image-to-video (I2V), and video auto-captioning/enhancement (VACE) — on dedicated GPU servers or VPS with compatible frameworks such as Hugging Face Diffusers or ComfyUI.
Minimum GPU requirements vary:
1.3B models: 12–16 GB VRAM (e.g., RTX 3080, A4000)
14B models: 24–48 GB VRAM (e.g., RTX 4090, A5000, A6000, A100)
High-speed inference: Use NVLink-enabled dual GPU or high-bandwidth memory GPUs
Yes. Both Wan2.1-T2V-1.3B-Diffusers and Wan2.1-T2V-14B-Diffusers can be used with ComfyUI by loading the proper nodes and handling video output (MP4/WebM). This offers a visual node-based way to build workflows.
Hugging Face Transformers + Diffusers (Python script)
ComfyUI (drag-and-drop workflows)
Dockerized environments (for production scaling)
FastAPI + Gradio for web API/UI
As of now, Wan-AI Service are free for research and non-commercial use, but always check the specific license on Hugging Face for each model version.
You can self-host the following Wan-AI models:
Text-to-Video: Wan2.1-T2V-1.3B, Wan2.1-T2V-14B
Image-to-Video: Wan2.1-I2V-14B-480P, Wan2.1-I2V-14B-720P
Video-Audio Co-evolution (VACE): Wan2.1-VACE-14B, Wan2.1-VACE-1.3B
Diffusers-compatible variants for easier integration: -Diffusers
No. These are not LLMs. Wan2.1 models are Diffusers-based multimodal generation models and are best run via Hugging Face’s Diffusers, ComfyUI, or a custom FastAPI backend. vLLM, TGI, and Triton are generally not required unless adapting for advanced inference pipelines.
Yes, FFmpeg is typically used:
To encode image sequences into MP4/WebM
To combine video and audio if using VACE models
Ensure FFmpeg is installed and callable in your server environment.
Diffusers version: Works with Hugging Face diffusers pipeline or ComfyUI.
Non-Diffusers version: May require custom integration, may not work out-of-box with from_pretrained() Diffusers pipeline.
Yes. With sufficient GPU resources, you can integrate these models into a platform or service offering text-to-video, image-to-video, or video+audio generation.

Our Customers Love Us

From 24/7 support that acts as your extended team to incredibly fast website performance

Need help choosing a plan?

Need help? We're always here for you.