VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction

Jiahao Zhang, Ryota Yoshihashi, Shunsuke Kitada, Ph.D., Atsuki Osanai, Yuta Nakashima

August, 2025 Layout Generation, Creative Graphic Design

Abstract

Large language models (LLMs) have proven effective for layout generation due to their ability to produce structured description, such as HTML. In this paper, we argue that their limitation in visual understanding leads to insufficient performance in tasks requiring visual content, e.g., content-aware layout generation. Therefore, we explore whether large vision-language models (LVLMs) can be applied to content-aware layout generation and propose the training-free Visual-Aware Self-CCorrection Layout Generation (VASCAR), taking inspiration from the iterative revision of designers. VASCAR enables LVLMs (e.g., GPT-4o and Gemini) iteratively refine their outputs with reference to layout rendered layout images. Extensive experiments and user study demonstrate VASCAR’s effectiveness and versatility, achieving state-of-the-art (SOTA) layout generation quality.

Type

Preprint

Publication

第 28 回画像の認識・理解シンポジウム，2025.

Domestic Conference Non-refereed MIRU

Shunsuke Kitada, Ph.D.

Research Scientist working on Vision & Language with Deep Learning

My research interests include deep learning-based natural language processing, computer vision, medical image processing, and computational advertising.