# A.I. Rowling
- What: This is a neural network for generating new Harry Potter content which I created for use on my podcast Paging Mr Potter. I trained the model on the entire book series and it can now produce completely new text in the style of J.K. Rowling. It probably isn’t quite good enough to fool an attentive reader but it has captured a lot of the writing style and vocabulary of these famous books.
- When: 2018 - 2020
- Who: Me
This demo works but it is pretty slow (usually ~30 secs per request but it can take up to 2 mins). Sorry but the model is a beast and it was too expensive for me to deploy properly ¯\_(ツ)_/¯.
# 1. LSTM
This is a LSTM (Long Short Term Memory) model I first trained in 2018 (back when it was still acceptable to use RNNs and not everyone had read the Attention Is All You Need (opens new window) paper). I implemented the model in Keras and trained it by taking every consecutive 50-word sequence in Harry Potter and predicting the next word. It works ok but trying to learn the entire syntax and grammar of the English language from J.K. Rowling isn't a great approach. The results are vaguely Lynchian.
# 2. Transformer
I decided I wanted to improve on the original model by trying out the Transformer architecture and using Transfer Learning (a technique where a model is trained on a general task first and then finetuned on your primary task). Luckily I stumbled upon a fantastic python package (opens new window) created by Max Woolf which allowed me to (fairly painlessly) take OpenAI's state of the art GPT-2 model and finetune it on the Harry Potter books. The pre-trained OpenAI models are very large and so for practical reasons I chose to use the smallest one (124M parameters). Even still, the results were a dramatic improvement on the LSTM model and it is this model being used in the demo above.
All of the code (and some of the models) are available HERE (opens new window).