Digital Rock Heritage: Machine Learning Lyrics Generation in the Style of Lou Reed

Lou Reed, the frontman of The Velvet Underground, pioneer of experimental rock music, has always been my biggest music influence. Being a Rock n’ Roll fan since age 12, I view Rock as one of the most valuable cultural asset of contemporary culture. It’s time to treat Rock seriously and pay a creative tribute.
Other than his music, the verbally bold, stream-of-consciousness, delmore schwartz-influenced lyrics of his compelled me to study the interactional influence that speech had in lyrical music. As the legend passed away, I think it’s important to pay tribute to contemporary rock music genres by keeping a digital musicology database. For creative use, I apply Markov Chain model to generate independent form, and a ‘sequential next word’ method to generate sentence following a vocabulary you input in, all in Lou Reed’s style.
let’s have some fun first
Version 18 <Wendy Says>
ps. “One chord is fine. Two chords are pushing it. Three chords and you’re into jazz.” – Lou Reed
so this one is ‘pushing it’ :0

Data Preprocessing – Get lyrics

The goal is to process 50-150 songs’ lyrics of Lou Reed for preprocessing. I change the number of songs to 150 for more information. Then print out the first 1500 characters to examine.

import lyricsgenius as lg
file = open("/content/drive/MyDrive/LouReed.txt", "w")
genius = lg.Genius('TqI1Ujt4Hqr5Abv3NZ3eObES3P9jxC6F8ZXZIBgePY9y9r1px8LH6z5LhY3erWvw'
artists = ['Lou Reed']
def get_lyrics(arr, k): c=0
    for name in arr:
            songs = (genius.search_artist(name, max_songs=k, sort='popularity')).song
            s = [song.lyrics for song in songs]
            file.write("\n \n   <|endoftext|>   \n \n".join(s))
            c += 1
            print(f"Songs grabbed:{len(s)}")
            print(f"some exception at {name}: {c}")
import lyricsgenius
genius = lyricsgenius.Genius("TqI1Ujt4Hqr5Abv3NZ3eObES3P9jxC6F8ZXZIBgePY9y9r1px8LH6z5
get_lyrics(artists, 150)
     Searching for songs by Lou Reed...
     Song 1: "Walk on the Wild Side"
     Song 2: "Perfect Day"
     Song 3: "Coney Island Baby"
     Song 4: "Street Hassle"
     Song 5: "Satellite of Love"
     Song 6: "Andy’s Chest"
     Song 7: "Vicious"
     Song 8: "Dirty Blvd."
     Song 9: "Yeezus Review"
     Song 10: "Make Up"
     Song 11: "Caroline Says II"
     Song 12: "Hangin’ ’Round"
     Song 13: "Goodnight Ladies"
     Song 14: "Caroline Says I"
     Song 15: "Berlin"
     Song 16: "The Kids"
     Song 17: "Lady Day"
     Song 18: "Sad Song" 1/6
 6/17/2021 Copy of Viva la Lou Reed-Data Collection.ipynb - Colaboratory
      Song 19: "Romeo Had Juliette"
     Song 20: "New York Telephone Conversation"
     Song 21: "How Do You Think It Feels"
     Song 22: "Men of Good Fortune"
     Song 23: "The Bed"
     Song 24: "Berlin (1973 version)"
     Song 25: "I’m So Free"
     Song 26: "Wagon Wheel"
     Song 27: "Oh, Jim"
     Song 28: "Heroin [Rock ‘N’ Roll Animal]"
     Song 29: "I Wanna Be Black"
     Song 30: "Kill Your Sons"
     Song 31: "Halloween Parade"
     Song 32: "Last Great American Whale"
     Song 33: "Charley’s Girl"
     Song 34: "New Sensations"
     Song 35: "Lisa Says"
     Song 36: "Waves of Fear"
     Song 37: "I Love You"
     Song 38: "The Blue Mask"
     Song 39: "Sally Can’t Dance"
     Song 40: "Crazy Feeling"
     Song 41: "This Magic Moment"
     Song 42: "She’s My Best Friend"
     Song 43: "Sick of You"
     Song 44: "Strawman"
     Song 45: "The Gun"
     Song 46: "There Is No Time"
     Song 47: "Vanishing Act"
     Song 48: "My House"
     Song 49: "Good Evening Mr. Waldheim"
     Song 50: "Endless Cycle"
     Song 51: "Busload of Faith"
     Song 52: "The Power of the Heart"
     Song 53: "Beginning of a Great Adventure"
     Song 54: "Rock ‘n’ Roll [Rock ‘N’ Roll Animal]"
     Song 55: "Kicks"
     Song 56: "Hold On"
     Song 57: "Gimmie Some Good Times"
artist_file = '/content/drive/MyDrive/LouReed.txt'
with open(artist_file) as f:
    print (
     [Verse 1]
     Holly came from Miami, FLA
     Hitchhiked her way across the U.S.A
     Plucked her eyebrows on the way
     Shaved her legs and then he was a she
     She says, "Hey babe, take a walk on the wild side"
     Said, "Hey honey, take a walk on the wild side"
     [Verse 2]
     Candy came from out on the Island
     In the backroom, she was everybody's darling 2/6

6/17/2021 Copy of Viva la Lou Reed-Data Collection.ipynb - Colaboratory
But she never lost her head
     Even when she was giving head
     She says, "Hey babe, take a walk on the wild side"
     Said, "Hey babe, take a walk on the wild side"
     [Post Chorus]
     And the colored girls go
     Doo, doo-doo, doo-doo, doo-doo-doo
     Doo, doo-doo, doo-doo, doo-doo-doo
     Doo, doo-doo, doo-doo, doo-doo-doo
     Doo, doo-doo, doo-doo, doo-doo-doo
     (Doo, doo-doo, doo-doo, doo-doo-doo
     Doo, doo-doo, doo-doo, doo-doo-doo
     Doo, doo-doo, doo-doo, doo-doo-doo
     Doo, doo-doo, doo-doo, doo-doo-doo
     [Verse 3]
     Little Joe never once gave it away
     Everybody had to pay and pay
     A hustle here and a hustle there
     New York City is the place where they said
     "Hey babe, take a walk on the wild side"
     I said, "Hey Joe, take a walk on the wild side"
     [Verse 4]
     Sugar Plum Fairy came and hit the streets
     Looking for soul food and a place to eat
     Went to the Apollo
     You should've seen him go, go, go
     They said, "Hey sugar, take a walk on the wild side"
     I said, "Hey babe, take a walk on the wild side"
     Alright, huh
     [Verse 5]
     Jackie is just speeding away
     Thought she was James Dean for a day
     Then I guess she had to crash
     Valium would've helped that bash

Text Data Cleansing

# to count the frequency of words

import random
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
def plotWordFrequency(input):
    f = open(artist_file,'r')
    words = [x for y in [l.split() for l in f.readlines()] for x in y] 3/6
6/17/2021 Copy of Viva la Lou Reed-Data Collection.ipynb - Colaboratory
words   [x for y in [l.split() for l in f.readlines()] for x in y]
data = sorted([(w, words.count(w)) for w in set(words)], key = lambda x:x[1], rev
    most_words = [x[0] for x in data]
    times_used = [int(x[1]) for x in data]
    plt.figure(figsize=(20,10)), height=times_used, color = 'grey', edgecolor = 'bla
    plt.xticks(rotation=45, fontsize=18)
    plt.yticks(rotation=0, fontsize=18)
    plt.xlabel('Most Common Words:', fontsize=18)
    plt.ylabel('Number of Occurences:', fontsize=18)
    plt.title('Most Commonly Used Words: %s' % (artist_file), fontsize=24)
artist_file = '/content/drive/MyDrive/LouReed.txt'
import tensorflow as tf
import numpy as np
import os
import time
stopChars = [',','(',')','.','-','[',']','"']
# preprocessing the corpus by converting all letters to lowercase,
# replacing blank lines with blank string and removing special characters
def preprocessText(text):
  text = text.replace('\n', ' ').replace('\t','')
  processedText = text.lower()
  for char in stopChars:
    processedText = processedText.replace(char,'')
  return processedText
 def corpusToList(corpus):
  corpusList = [w for w in corpus.split(' ')]
  corpusList = [i for i in corpusList if i] #removing empty strings from list
  return corpusList
corpus_path = '/content/drive/MyDrive/LouReed.txt'
text = open(corpus_path, 'rb').read().decode(encoding='utf-8')
text = preprocessText(text)
corpus_words = corpusToList(text)
map(str.strip, corpus_words)
     <map at 0x7f0b3ca08550>
vocab = sorted(set(corpus_words))
print('Corpus length (in words):', len(corpus_words))
print('Unique words in corpus: {}'.format(len(vocab)))
word2idx = {u: i for i, u in enumerate(vocab)}
idx2words = np.array(vocab)
word_as_int = np.array([word2idx[c] for c in corpus_words])
     Corpus length (in words): 59240
     Unique words in corpus: 5635

Generation: Using Markov Chain for independent poetic lyrics

import json
import random
import numpy as np
import pandas as pd

text = open('/content/drive/MyDrive/LouReed.txt').read()

def data_clean_round1(text):
    text = text.lower()                                                 # lower case text
    text = re.sub('<[^>]*>', ' ', text)                                    # remove all html tags
                                                                        # remove square brackets
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)     # remove punctuations
    return text.strip()

pip install markovify

corpus = text
import markovify
model = markovify.NewlineText(corpus)
model_a = markovify.Text(corpus)
model_b = markovify.Text(corpus)

model_combo = markovify.combine([ model_a, model_b ], [ 1.5, 1 ])

for i in range(5):
    for i in range(random.randrange(1, 7)):

example output:

But don’t you want to When you’ve been up forNoah’s Ark When I don’t get sick of this family And though I could do You can believe it or you can be fixed And here’s to the bathroom, oh, oh Watch me turn into a mist Release all your two-bit friends And if we have a vanishing act I’m just a shooting star Does anyone need a new treatment He’s that guy on the floor, huh Average height, an average place Oh is it that way This is the one who came here Now just shining up with my thoughts There is no time for my man Who made the sky And both of her mind Hey, now that you’re gone And I wanted to dance a lot

for i in range(5):
    for i in range(random.randrange(1, 4)):

The output is the picture (1) shown above

for i in range(6):
    for i in range(random.randrange(1, 7)):

The output is: Are you and me What is in her head Causing him to spit That’s how we got left Listening to my place And It’s not a waste Strapped to the skies I don’t like it To me they end I’m not so nice I got the crabs She had to crash High in the middle Looking for some stud I’m glad that we took The sickness of the brave Not somebody that you die Now, now, now, now, look Could it be lovely This is a medical advance I’m going to be The fit and describe When you put your heart

Method 2: for ‘following next line

from keras.layers import LSTM, Dense, Dropout, Flatten
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.optimizers import RMSprop
from keras.utils import np_utils

text = corpus.lower()

chars = list(set(text))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

vocab_size = len(chars)
print('Vocabulary size: {}'.format(vocab_size))

X = [] 
Y = [] 

length = len(text)
seq_length = 100

for i in range(0, length - seq_length, 1):
    sequence = text[i:i + seq_length]
    label = text[i + seq_length]
    X.append([char_indices[char] for char in sequence])

print('Number of sequences: {}'.format(len(X)))

X_new = np.reshape(X, (len(X), seq_length, 1))

X_new = X_new/float(len(chars))

Y_new = np_utils.to_categorical(Y)

X_new.shape, Y_new.shape

model = Sequential()
model.add(LSTM(150, input_shape = (X_new.shape[1], X_new.shape[2]), return_sequences = True))

model.add(Dense(Y_new.shape[1], activation = 'softmax'))

model.compile(loss = 'categorical_crossentropy', optimizer = 'adam'), Y_new, epochs = 1, verbose = 1)

9581/9581 [==============================] - 1117s 115ms/step - loss: 3.0169
<keras.callbacks.History at 0x7fa17de147d0>

start = np.random.randint(0, len(X)-1)
string_mapped = list(X[start])
full_string = [indices_char[value] for value in string_mapped]

# Generation 
for i in range(400):
    x = np.reshape(string_mapped, (1, len(string_mapped), 1))
    x = x / float(len(chars))
    pred_index = np.argmax(model.predict(x, verbose = 0))
    seq = [indices_char[value] for value in string_mapped]
    string_mapped = string_mapped[1:len(string_mapped)]
# Combine text
newtext = ''
for char in full_string:
    newtext = newtext + char


import keras.utils as ku
import tensorflow as tf
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding
from keras.utils import np_utils

text = corpus.lower()
text = text.split('\n')

tokenizer = Tokenizer(num_words = None, filters = '#$%&()*+-<=>@[\\]^_`{|}~\t\n', lower = False)

total_words = len(tokenizer.word_index) + 1

sequences = []
for line in text:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]

max_seq_len = max([len(x) for x in sequences])
sequences = np.array(pad_sequences(sequences, maxlen = max_seq_len, padding = 'pre'))

predictors, label = sequences[:, :-1], sequences[:, -1]
label = tf.keras.utils.to_categorical(label, num_classes = total_words)

input_len = max_seq_len - 1
model = Sequential()
model.add(Embedding(total_words, 10, input_length = input_len))
model.add(Dense(total_words, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam'), label, epochs = 30, verbose = 1)

def generate_line(text, next_words, max_seq_len, model):
    for j in range(next_words):
        token_list = tokenizer.texts_to_sequences([text])[0]
        token_list = pad_sequences([token_list], maxlen = max_seq_len - 1, padding = 'pre')
        predicted = model.predict_classes(token_list, verbose = 0)
        output_word = ''
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
        text += ' ' + output_word
    return text

generate_line("wendy says", 5, max_seq_len, model)