The Future of Taste
An uncertain examination of whether taste will be an advantage for humans after AGI
“Research is to see what everybody else has seen, and to think what nobody else has thought.” - Albert Szent-Gyorgyi
Update: this piece, which was speculative and uncertain when I wrote it, now has empirical support. See METR’s “Measuring AI Ability to Complete Long Tasks” and Jason Hausenloy’s analysis of its labor-market implications.
About a year ago, Dwarkesh asked Dario Amodei a question about LLMs and novel discovery. Marginal Revolution recently noted that it still doesn’t have a compelling answer:
“...what do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?
Whereas if even a moderately intelligent person had this much stuff memorized, they would notice — Oh, this thing causes this symptom. This other thing also causes this symptom. There's a medical cure right here.
Shouldn't we be expecting that kind of stuff?”
Today’s LLMs are not good at creativity and long-term planning. If this trend continues, AI might get better at completing tasks much faster than it gets good at setting direction based on taste. By taste, I mean an opinionated vision for what something should be, and the related judgment and conviction required to choose novel directions without obvious prior evidence of success.
This means two things could be true:
The work that requires those skills may be domains people can compete in for much longer than others.
The best of us might be able to compete on this front even after AI is competitive.
For example, today’s LLMs are good at writing – but they aren’t excellent at it.
I find that Anthropic’s Claude 3.5 Sonnet (Oct 2024) has the best writing taste of any model I’ve used. When I give it a post to review, it has opinionated thoughts about its direction and style. Other models are more capable – better at coding, sharper at analysis – but Claude consistently offers me the best feedback on my writing and is the most likely to surface an idea I hadn’t considered.
But even then, Claude is far worse at writing a story than editing my own. In a bizarre twist of how I thought my life would change with an always available second brain, I usually write my first draft without an LLM, then turn it over to Claude to see what I hadn’t considered. I’ve tried the inverse approach – asking for an outline based on a vague idea or requesting a written essay from an outline. So far, it still has far worse taste at creating than critiquing.
I have no doubt that the models will get better at writing, but it speaks to a larger deficit. It is simply harder to train models to complete creative tasks at high quality than to code. There is less data to consider, and the output is subjective. Code has correct answers, but “which company should I create” or “what should my next album sound like” are not binary questions with obvious truths. The rise of reasoning models has produced incredible results at tasks with a ground truth, but I find GPT-4.5 to be a better writer than o1 pro.1
If you train a model on every successful startup, it should find important trends to consider. But in the near future, it seems unlikely that it could spot the next trend with the same conviction the greatest human founders have. To create something new is to do so despite the evidence, based on a delusion that you have seen what others haven’t and can build that world. Given that most good ideas looked impossible until they worked, I’m not sure how you can give a model feedback on this unless it is allowed to execute on real-world tasks it thinks are extremely unlikely to succeed.
On top of that, vague, long-horizon tasks are not easily RL-able. You may need to allow models to execute on those tasks and measure their outputs far into the future, then give them a signal based on their success. Self-play isn’t as effective when the skill you want a model to improve in requires real-world interaction. What would it look like to allow a model to pitch startup ideas, found the company, steer it to success or failure, and measure if it worked? Getting any reliable data on this seems difficult, especially relative to other tasks.
Maybe that is the final frontier for white-collar work. With intellectual schlep-work deleted, we may retain an advantage in setting the taste – what connections to make, what convictions to have, and when to bet everything we have on a thing we cannot prove – of the future.
If this is true, the future of intellectual work is for direction-setters and disruptors. Your ability to do grunt work will be irrelevant, but your ability to see something someone else didn’t and your conviction to go for it despite your doubts could still let you win. Models will be far better than us at executing tasks in the direction we set them in. But, for a while, they might be worse at choosing the best path. Not because they lack the ability to evaluate the options, but because they lack the taste to identify the best one and execute on that vision.
Such a world would be the best time to create in history – as long as you have the taste to know what to do. The mindless paperwork, the difficult research, and the pointless tasks that today serve as a barrier between you and your most ambitious idea would get deleted. The only thing stopping you would be you.
Does that raise the bar for participating in the economy?
There are far fewer Shakespeares than email writers; a million musicians try and fail to climb the Billboard 100. Soon, our world might be geared for the greats – those with the instinct and the sense to go for broke and revolutionize a field by trying a direction that shouldn’t work. The difference between the thing that shouldn’t work and the thing that does is partially luck, but it is largely a skill we struggle to benchmark. In such a world, what is the role for people who don’t have an existing taste? Can we teach it?
Even if we can teach taste, it might be harder to couple it with agency. It is difficult to do something off the default path, which is why the vast majority of people don’t do it. Most people have never been trained to do it, and it is genuinely harder to do than to follow instructions. Without a path to tread, will most people be able to be agentic, and will they have the taste to know when and what to do?
Perhaps there is more to taste than the greats, and more room for everyone to compete than I can conceptualize. Today’s influencers shovel opinions with less poise than Socrates or Jefferson, but they have no problem commanding a brand. Restaurateurs who would never earn a Michelin nevertheless understand how to make food better than I do.
The bar to compete will be raised, but people from outside the circles of power – the established, the entrenched – may still win.
Or maybe o4 will have better taste than all of us.
Thank you to Rudolf Laine, Deric Cheng, and Riya Kataria for reviewing drafts of this post
I still think Claude 3.5 Sonnet (Oct 2024) has the best editing taste of any model I used. Early in reviewing this post, I switched from using GPT-4.5 to Claude 3.7 Sonnet for feedback. I still found that unsatisfactory, and ultimately reverted to 3.5 Sonnet (Oct 2024). Side note: Anthropic, if you’re listening, please rename this.
Toner-Rodgers' paper provides empirical evidence relevant to your argument, demonstrating that AI enhances research productivity by automating idea-generation tasks but remains reliant on human expertise, specifically, the top scientists' "taste" and judgment, to identify truly promising innovations. The findings reinforce your point that creative discernment may be humanity's enduring advantage over AI in innovation. Link to paper: https://arxiv.org/abs/2412.17866 Also, hi!
Incredible essay, as always