Alex Harrison Parker | Planetary Astronomy

Planetary Astronomer, Southwest Research Institute

Research scientist in planetary astronomy at the Southwest Research Institute, supporting NASA's New Horizons mission to Pluto, and developing the post-Pluto mission into the Kuiper Belt. Expert in the dynamics of binary minor planets, detection and characterization of trans-Neptunian objects, and the origin of the architecture of our Solar System.

Generating YA Book Titles With A Neural Network

The Wicked Fire — The True Story of the Sea of Mary — Revellion — Ashes and Peril — The Girl and the Storm

Neutral Networks are wonderfully useful constructs. We can use them to predict behaviors in physical systems, identify the contents of images, enhance the performance of scientific data analysis algorithms, and to perform many other tasks. They come in many different flavors, each suited to different types of tasks. In general, they learn a set of behaviors from a body of information called a training set, from which they attempt to identify the best means of reproducing or identifying some set of properties of that information. 

Some behaviors are hard for neural networks to learn, while some are relatively easy. Generating snippets of stylized text is a strong suit of a class of neural networks called character-level recurrent neural networks (RNNs). These networks don't start out knowing how language works - they don't have a model of grammar, or a dictionary of words that they draw from. Instead, they learn patterns of language that emerge from a character-by-character analysis of a body of text. And they can be spookily good at it.

Over at Janelle Shane's wonderful blog "Letting Neural Networks Be Weird," I was introduced to a particularly well-developed implementation of this type of algorithm: textgenrnn, by Max Woolf. Feed it a text corpus, and it will do its best to create new text that follows a similar set of rules.

So I fed it 10,757 YA book titles.

I pulled these titles from user-maintained lists at Goodreads, including titles from 2010-2018. There were a number of malformed entries and duplicates that I identified and removed both algorithmically and by hand. 

YA book titles follow a number of conventions — some obvious and easy to describe, others less so but still important to a sense of "YA title"-ishness. The neural network was remarkably good at identifying and reproducing both of these kinds of conventions.

After training, I had the neural network generate 3,000 new titles. You can read the full, unedited list here. There are some duplicates (particularly involving "Secrets" and "Stars"), and some are almost certainly published titles, but by and large the results are the novel products of a simple algorithmic mind that only knows how to communicate via YA titles.

Here are some highlights:

The Iron Girls
Sister of the Trance Investive Story

Spell of the Siren's Daughter
The Wishmore Secrets
Some of the Sea
The Blood of the Stars
The Socket Dream
The Truth of the Galled
Endless Blue
The Last Girl Broken
Storm of the Moon
The Real of the Shadow Black
End of the Stars
The Lost Girl in the Moon
The Colding Star
Between the Beauty
Black Comes the Light
The Seven of the Found

The Beast of the Sea
House of the Musk Magic
The Skinchanger
The Secrets of the Red Dawn
Space for Darkness
The Circle of the Still Things
The Story of the Sea
The Dark and the Devil

The Revenge of Silence
The Elemental Secrets

Stone of the Fire
The Beauty of Shadows


... And some near-misses:

Compost Side of the Moon
The Extraction of Skin Paradies
The Pirator of True Things

The Season Rood of the Dark Stone
Flirth Sweet
The Dead As Beauting Things
Pifter of Mist
Mania Lion
The Secret Secrets
A Halourian of the World of Lunshine

The Unbeasing Tragicles
Good The Words

Which of these AI-generated titles would you feel compelled to pick up at the bookstore and read?

Remember, this network doesn't know anything about the English language and syntax other than what it could learn from YA titles alone. It had to learn how to string characters into words, and words into sentences, and how to capitalize and punctuate those sentences, all without knowing that any of those concepts exist. In spite of these limitations, it is remarkably good at producing compelling YA book titles, and also at generating new nonsense words that still sound like they belong in a YA book title. Many of the real English words you see in the list of generated titles were not actually in the training set at all; the neural network learned a sufficient number of rules about how to construct real English words character-by-character that it was re-inventing them from scratch.


Powered by Squarespace. Background image copyright Alex H. Parker, 2009.