For Wired‘s latest cover story, veteran journalist Steven Levy does what he’s done for so much of his career: profile a company that presents the possibility of a seismic technological shift. In this case, it’s the outfit that brought generative AI within reach for anyone with a web browser—and now is trying to convince the world that this isn’t humanity’s final Pandora’s Box.
The name that Radford and his collaborators gave the model they created was an acronym for “generatively pretrained transformer”—GPT-1. Eventually, this model came to be generically known as “generative AI.” To build it, they drew on a collection of 7,000 unpublished books, many in the genres of romance, fantasy, and adventure, and refined it on Quora questions and answers, as well as thousands of passages taken from middle school and high school exams. All in all, the model included 117 million parameters, or variables. And it outperformed everything that had come before in understanding language and generating answers. But the most dramatic result was that processing such a massive amount of data allowed the model to offer up results beyond its training, providing expertise in brand-new domains. These unplanned robot capabilities are called zero-shots. They still baffle researchers—and account for the queasiness that many in the field have about these so-called large language models.
Radford remembers one late night at OpenAI’s office. “I just kept saying over and over, ‘Well, that’s cool, but I’m pretty sure it won’t be able to do x.’ And then I would quickly code up an evaluation and, sure enough, it could kind of do x.”