Argyris Papadopoulos, "Music generation with Neural Networks
", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2022
https://doi.org/10.26233/heallink.tuc.91691
Machine Learning has been used in many applications in recent years and has produced impressive results. Interestingly, beyond well-known problems of academic, research or commercial interest, machine learning techniques are finding more and more their way into the area of arts, in the sense of generative modeling. In this thesis, we explore how deep neural networks can be used to automatically generate musical sequences. The goal of this work is to construct models that are able to learn the basic patterns of an input dataset (corresponding to a specific music genre) and try to replicate these patterns embedded into new, original samples, under the assumption that modeling and sampling may be more effective in two-dimensional image representations. To this end, we utilize well-known machine learning generative models, namely Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), and propose our own deep network architectures maintaining simplicity and low computational power and time requirements. The implemented models are trained using datasets of MIDI files, containing tunes from different music genres, which are first converted to two-dimensional images during preprocessing. After training at image level, the trained models generate new images of a similar kind, which are decoded back to MIDI tunes, following a reverse procedure, and thus to music. MIDI files are ideal for our purposes due to their discrete nature, which facilitates the conversion to images back and forth. In the course of this work, we focused on data engineering issues, namely how to shape and form data in a way that helps our generative models learn easier and faster. We also offer a comparison of the different models and infer results on their effectiveness. The produced music tunes seem to resemble basic features of the original ones, but only in a few cases the outcome was truly interesting in terms of music theory. The proposed approach could potentially help musicians improve and explore original tunes based on preferred genres or types of music. Furthermore, our work can be used as a model for other learning tasks, not necessarily related to music, which may be facilitated, if explored through image representations, as we proposed. While our results do not generate consumer-grade music yet, our work represents a first step in the direction of automated music generation and computational creativity in general.