Skip to content

polygon.uploadbackup.com

Pushing the Boundaries of Open-Source Music Generation


Junmin Gong, Yulin Song, Sean Zhao, Sen Wang, Shengyuan Xu, Joe Guo, Xuerui Yang

acestudio_logo


stepfun_logo

📝 Abstract

🚀 We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings
commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves
quality beyond most commercial music models while remaining extremely fast—under 2 seconds
per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM,
and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style.

🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable
planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to
10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the
Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning
relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward
models or human preferences. 🎚️

🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing
capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence
to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative
workflows of music artists, producers, and content creators. 🎸

Udio-v1.5 7.45 7.65 6.15 8.03 4.15 3.96 4.09 3.93 3.86 34.9 24.8
Suno-v4.5 7.63 7.85 6.22 8.25 4.64 4.51 4.63 4.53 4.49 40.5 32.7
Suno-v5 7.69 7.87 6.51 8.29 4.72 4.62 4.71 4.63 4.56 46.8 34.2
Mureka-V7.6 7.44 7.71 6.35 8.13 4.43 4.29 4.35 4.29 4.21 36.2 22.4
MinMax-2.0 7.71 7.95 6.42 8.38 4.61 4.51 4.59 4.50 4.41 43.1 29.5
Yue 6.58 7.29 4.95 7.39 3.01 2.80 2.85 2.79 2.82 26.8 −4.6
ACE-Step 1.0 7.22 7.52 6.50 7.76 3.99 3.73 3.85 3.78 3.68 28.5 0.9
LeVo 7.61 7.78 5.92 8.31 3.55 3.35 3.32 3.31 3.20 29.4 −1.2
DiffRhythm 2 7.25 7.61 6.33 7.99 3.99 3.79 3.97 3.82 3.66 32.1 3.8
HeartMuLa 7.66 7.89 6.15 8.25 4.68 4.55 4.69 4.55 4.45 31.7 28.6
ACE-Step 1.5 7.42 8.09 6.47 8.35 4.72 4.67 4.72 4.66 4.59 39.1 26.3

Table 1: Comparison with commercial (top) and open-source (bottom) music generation models.
Bold = best, underline = second best. ↑ higher is better.

Generation Speed

4-min song on A100

10–120× faster

than alternatives

🎵 Examples


Caption Lyrics ACE-Step generated

🏗️ Framework & Application

ACE-Step Framework
Application Map

Album Art

01:00 / 04:00


⚠️ Limitations & Future Improvements 🔮

  1. 🎲 Output Inconsistency: Highly sensitive to random seeds and input duration, leading to
    varied “gacha-style” results.
  2. 🎵 Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap)
    Limited style adherence and musicality ceiling
  3. 🔄 Continuity Artifacts: Unnatural transitions in repainting/extend operations
  4. 🎤 Vocal Quality: Coarse vocal synthesis lacking nuance
  5. 🎛️ Control Granularity: Needs finer-grained musical parameter control
  6. 🌐 Multilingual Lyrics Compliance: Improved support for lyrics in multiple languages,
    enhancing accuracy and naturalness.



Source link

Author PolygonPosted on 6 February 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Post navigation

Previous Previous post: I'm tired of trying to make vibe coding work for me
Next Next post: 14 Minutes → 4 Seconds: A Tale of Switching from RSpec to Minitest

Archives

  • February 2026

Categories

  • Uncategorised

Recent Posts

  • If Jamaat comes to power, Bangladeshi Hindus will be safe. I am the proof | Bangladesh Election 2026
  • 'Sana-mania' hits Japan ahead of key snap elections
  • Europe Accuses TikTok of ‘Addictive Design’ and Pushes for Change
  • EC preliminarily finds TikTok's addictive design is Digital Services Act breach
  • T20 World Cup: Top-five player watch; Sharma, Raza, Abrar, Brook, Maxwell | ICC Men’s T20 World Cup News

Recent Comments

No comments to show.
polygon.uploadbackup.com Proudly powered by WordPress