当前位置:网站首页>Music generation through deep neural network

Music generation through deep neural network

2020-11-06 01:28:00 Artificial intelligence meets panchuang

author |Ramya Vidiyala compile |VK source |Towards Data Science

Deep learning has improved many aspects of our life , Whether it's obvious or subtle . Deep learning in the movie recommendation system 、 Spam detection and computer vision play a key role in the process .

Although the discussion about deep learning as a black box and the difficulty of training is still going on , But in medicine 、 There is a huge potential in many areas such as virtual assistants and e-commerce .

At the intersection of art and Technology , Deep learning can work . To further explore the idea , In this paper , We're going to look at the process of generating machine learning music through a deep learning process , Many people think that this field is beyond the scope of machines ( It's also another interesting area of intense debate !).

Catalog

  • The musical representation of machine learning model

  • Music dataset

  • Data processing

  • Model selection

  • RNN

  • Time distribution, full connection layer

  • state

  • Dropout layer

  • Softmax layer

  • Optimizer

  • Music generation

  • Abstract


The musical representation of machine learning model

We will use ABC music notation .ABC Notation is a shorthand notation , It uses letters a To G To show notes , And use other elements to place the added value . These added values include note length 、 Keys and decorations .

This form of symbol begins as a kind of ASCII Character set code , To facilitate online music sharing , Add a new and simple language for software developers , Easy to use . Here are ABC music notation .

The notation of music is No 1 The lines in the section show a letter followed by a colon . These represent all aspects of the tune , For example, the index when there are multiple tunes in the file (X:)、 title (T:)、 Time signature (M:)、 Default note length (L:)、 The type of melody (R:) Sum key (K:). The key name is followed by the melody .


Music dataset

In this paper , We're going to use the Nottingham music database ABC Open source data provided on the . It contains 1000 Many folk tunes , Most of them have been converted into ABC Symbol :http://abc.sourceforge.net/NMD/

Data processing

Data is currently in a character based classification format . In the data processing phase , We need to convert the data to an integer based numeric format , Prepare for the work of neural networks .

Here each character is mapped to a unique integer . This can be done by using a single line of code .“text” Variables are input data .

char_to_idx = { ch: i for (i, ch) in enumerate(sorted(list(set(text)))) }

To train the model , We use vocab Convert the entire text data into digital format .

T = np.asarray([char_to_idx[c] for c in text], dtype=np.int32)

Machine learning music generation model selection

In the traditional machine learning model , We can't store the previous phase of the model . However , We can use cyclic neural networks ( Often referred to as RNN) To store the previous stages .

RNN There's a repeating module , It takes input from the previous level , And take its output as the input of the next level . However ,RNN Only the most recent information can be retained , So our network needs more memory to learn about long-term dependencies . This is the long-term and short-term memory network (LSTMs).

LSTMs yes RNNs A special case of , Have and RNNs The same chain structure , But there are different repeat module structures .

Use here RNN Because :

  1. The length of the data doesn't need to be fixed . For each input , The data length may vary .

  2. You can store sequences .

  3. Various combinations of input and output sequence lengths can be used .

Except for the general RNN, We will also customize it to fit our use cases by adding some adjustments . We will use “ Character level RNN”. In the character RNNs in , Input 、 Both the output character and the output are in the form of conversion .

RNN

Because we need to generate output on each timestamp , So we're going to use a lot of RNN. In order to achieve multiple RNN, We need to put the parameters “return_sequences” Set to true, To generate each character on each timestamp . By looking at the figure below 5, You can understand it better .

In the diagram above , Blue units are input units , Yellow units are hidden units , Green units are output units . This is a lot of RNN A brief overview of . In order to know more about RNN Sequence , Here's a useful resource :http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Time distribution, full connection layer

To handle the output of each timestamp , We created a full connectivity layer of time distribution . To achieve this , We create a time distributed full connectivity layer on top of the output generated by each timestamp .

state

By changing the parameters stateful Set to true, The output of the batch is passed as input to the next batch . After combining all the features , Our model will look like the following figure 6 The overview shown .

The code snippet for the model architecture is as follows :

model = Sequential()
model.add(Embedding(vocab_size, 512, batch_input_shape=(BATCH_SIZE, SEQ_LENGTH)))

for i in range(3):

     model.add(LSTM(256, return_sequences=True, stateful=True))
     model.add(Dropout(0.2))
     
model.add(TimeDistributed(Dense(vocab_size)))
model.add(Activation('softmax'))

model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I strongly recommend that you use layers to improve performance .

Dropout layer

Dropout Layer is a regularization technique , In the process of training , Zero a small part of the input unit at each update , To prevent over fitting .

Softmax layer

The generation of music is a multi class classification problem , Each class is the only character in the input data . therefore , We used a softmax layer , The classification cross entropy is taken as a loss function .

This layer gives the probability of each class . From the probability list , We choose the one with the highest probability .

Optimizer

To optimize our model , We use adaptive moment estimation , Also known as Adam, Because it is RNN A good choice for .

Making music

up to now , We created one RNN Model , And train them according to their data . The model learns the pattern of input data in the training stage . We call this model “ Training models ”.

The input size used in the training model is the batch size . For music produced by machine learning , The input size is a single character . So we created a new model , It and "" Training models "" be similar , But the size of the input character is (1,1). In this new model , We load weights from the training model to copy the features of the training model .

model2 = Sequential()

model2.add(Embedding(vocab_size, 512, batch_input_shape=(1,1)))

for i in range(3): 
     model2.add(LSTM(256, return_sequences=True, stateful=True))
     model2.add(Dropout(0.2))
     
model2.add(TimeDistributed(Dense(vocab_size)))
model2.add(Activation(‘softmax’))

We load the weight of the trained model into the new model . This can be done by using a single line of code .

model2.load_weights(os.path.join(MODEL_DIR,‘weights.100.h5.format(epoch)))

model2.summary()

In the production of music , Randomly select the first character from the unique character set , Use the previously generated character to generate the next character , And so on . With this structure , We have music .

Here's a snippet of code to help us do this .

sampled = []

for i in range(1024):

   batch = np.zeros((1, 1))
   
   if sampled:
   
      batch[0, 0] = sampled[-1]
      
   else:
   
      batch[0, 0] = np.random.randint(vocab_size)
      
   result = model2.predict_on_batch(batch).ravel()
   
   sample = np.random.choice(range(vocab_size), p=result)
   
   sampled.append(sample)
   
print("sampled")

print(sampled)

print(''.join(idx_to_char[c] for c in sampled))

Here are some of the generated music clips :

  1. https://soundcloud.com/ramya-vidiyala-850882745/gen-music-1

  2. https://soundcloud.com/ramya-vidiyala-850882745/gen-music-2

  3. https://soundcloud.com/ramya-vidiyala-850882745/gen-music-3

  4. https://soundcloud.com/ramya-vidiyala-850882745/gen-music-4

  5. https://soundcloud.com/ramya-vidiyala-850882745/gen-music-5

We use what is called LSTMs The machine learning neural network for generating these pleasant music samples . Every piece is different , But similar to the training data . These melodies can be used for many purposes :

  • Enhance the creativity of artists through inspiration

  • As a productivity tool for developing new ideas

  • As an adjunct to the artist's work

  • Finish the unfinished work

  • As an independent piece of music

however , This model needs to be improved . Our training materials only have one instrument , The piano . One way we can enhance training data is to add music from a variety of instruments . Another way is to increase the genre of music 、 Rhythm and rhythm characteristics .

at present , Our pattern produces some false notes , Music is no exception . We can reduce these errors and improve the quality of music by increasing the training data set .


summary

In this article , We studied how to deal with music used with neural networks , Deep learning models such as RNN and LSTMs How it works , We also discussed how to adjust the model to produce music . We can apply these concepts to any other system , In these systems , We can generate other forms of art , Including the creation of landscapes or portraits .


Thanks for reading ! If you want to experience this custom dataset for yourself , You can download annotated data here , And in Github Check my code on :https://github.com/RamyaVidiyala/Generate-Music-Using-Neural-Networks

Link to the original text :https://towardsdatascience.com/music-generation-through-deep-neural-networks-21d7bd81496e

Welcome to join us AI Blog station : http://panchuang.net/

sklearn Machine learning Chinese official documents : http://sklearn123.com/

Welcome to pay attention to pan Chuang blog resource summary station : http://docs.panchuang.net/

版权声明
本文为[Artificial intelligence meets panchuang]所创,转载请带上原文链接,感谢