For an AI language model to be effective, it needs to be trained by a human doing a job called annotation: the tedious, low-paid work of labelling zillions of image examples so that the model can accurately identify an object in a variety of settings. (Think of a polo shirt on a human, hanging in a closet, against a backdrop outside, etc., etc.) Josh Dzieza took a few shifts as an annotator and spoke to over two dozen of them to find out exactly how the bots learn.
Much of the public response to language models like OpenAI’s ChatGPT has focused on all the jobs they appear poised to automate. But behind even the most impressive AI system are people — huge numbers of people labeling data to train it and clarifying data when it gets confused. Only the companies that can afford to buy this data can compete, and those that get it are highly motivated to keep it secret. The result is that, with few exceptions, little is known about the information shaping these systems’ behavior, and even less is known about the people doing the shaping.