CMC24 - Generative AI Music

Learning Material (Part 1)
- Required
- Optional (but suggested, for real!)
  - Deep Learning
  - Clean Code
Learning Material (Part 2)
- Papers on Generative Audio

Learning Material (Part 1)

Required

Generative Music AI Theory + Implementation

The Sound of AI’s Generative Music AI Course:
- Video lectures (theory + implementations)
- Code + slides
Theory behind RNNs/LSTMs as covered in the following videos of the Deep Learning (for Audio) with Python:
- Recurrent Neural Networks Explained Easily
- Long Short Term Memory (LSTM) Networks Explained Easily
The Sound of AI’s Generating Melodies with LSTM Nets Course:
- Video lectures (theory + implementation)
- Code + slides

Generative Music AI Papers

GenJam: A genetic algorithm for generating jazz solos
The Generative Electronic Dance Music Algorithmic System (GEDMAS)
Liquiprism: Generating Polyrhythms With Cellular Automata
Automatic Stylistic Composition of Bach Chorales with Deep LSTM (aka BachBot) [presentation] [paper]
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation [paper][website]
Music Transformer: Generating Music with Long-Term Structure [blog] [paper]

Symbolic Music Representation Formats

[TODO] Course Doc and implementation for MIDI and abc notation
Basic MIDI tokenizations such as MIDI-Like from MIDITok

Optional (but suggested, for real!)

Deep Learning

The Sound of AI’s Deep Learning (for Audio) with Python:
- Video lectures (theory + implementation)
- Code + slides

Clean Code

The Sound of AI’s Uncle Bob’ SOLID Principles for Machine Learning Engineers Course:
- Video lectures (theory + implementation)
- Code + slides
Clean Code in Python - Second Edition: Develop maintainable and efficient code

Learning Material (Part 2)

Papers on Generative Audio

Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 12.
Kumar, K., Kumar, R., De Boissiere, T., Gestin, L., Teoh, W. Z., Sotelo, J., … & Courville, A. C. (2019). Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems, 32. [keywords: Vocoder; Phase construction]
Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., & Roberts, A. (2019). Gansynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710. [keywords: conditional training]
Engel, J., Hantrakul, L., Gu, C., & Roberts, A. (2020). DDSP: Differentiable digital signal processing. arXiv preprint arXiv:2001.04643. [keywords: inductive bias, signal processing units, real time]
Caillon, A., & Esling, P. (2021). RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. arXiv preprint arXiv:2111.05011. [keywords: conditional training]
Huzaifah, M., & Wyse, L. (2021). Deep generative models for musical audio synthesis. Handbook of artificial intelligence for music: foundations, advanced approaches, and developments for creativity, 639-678. [keywords: “review” paper]
Wyse, L., Kamath, P., & Gupta, C. (2022, April). Sound model factory: An integrated system architecture for generative audio modelling. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) (pp. 308-322). Cham: Springer International Publishing. [keywords: playability; latent space]
Garcia, H. F., Seetharaman, P., Kumar, R., & Pardo, B. (2023). Vampnet: Music generation via masked acoustic token modeling. arXiv preprint arXiv:2307.04686. [keywords: transformer, in-painting, masking for training, codecs]
Evans, Z., Parker, J. D., Carr, C. J., Zukowski, Z., Taylor, J., & Pons, J. (2024). Stable audio open. arXiv preprint arXiv:2407.14358. [keywords: Text-2-audio; Open (data, weights, code, latent diffusion]
Rafael Valle, Rohan Badlani, Zhifeng Kong, Sang-gil Lee, Arushi Goel, Sungwon Kim, Joao Felipe Santos, Shuqi Dai, Siddharth Gururani, Aya AlJa’fari, Alex Liu, Kevin Shih, ˜ Wei Ping, Bryan Catanzaro (2024). Fugatto 1 Foundational Generative Audio Transformer Opus 1