The Art of Asking Questions

Adam Mastroianni

Everyone seems to agree that self-report questions are fraught with lies, biases, errors, and other inaccuracies. We all use them anyway. How can we ask them better?

It was 2016, and Professor Tony Greenwald was peering into the future. It was a heady time for election prognostication: FiveThirtyEight was in its prime, the New York Times had begun revving up its “election needle,” and Princeton’s neuroscientist-cum-forecaster Sam Wang, after cramming every poll into a mathematical model, claimed he would eat a bug if his predictions were wrong.

But Greenwald wasn’t satisfied with these traditional prediction methods. Pollsters collect data about voter preferences by asking people a series of questions. First, they try to measure how likely the respondent is to vote: “How much thought have you given to the upcoming election?,”  “Do you happen to know where people in your neighborhood go to vote?,” “Did you vote in the last election?,” etc. Then they ask about which candidate the respondent plans to vote for. 

This approach relies on what social scientists call “self-report”: a person’s verbal description of what’s going on inside their head. 1 Self-reports are quick and easy to get — you simply ask someone a question. But they can also mislead you. What if people lie? What if they’re ashamed to admit which candidate they support (as many worried might happen with Trump supporters)? What if they don’t really know whom they’re going to vote for, and they make something up?

Enter Greenwald and his Implicit Association Test (IAT), a psychological tool that uses reaction times to measure how tightly certain concepts are tied together in a person’s mind. If you’re slower to put Black faces and positive words together than you are to put white faces and positive words together, the thinking goes, then “Black” and “good” are less closely associated in your mind. The IAT is one of many tools that social scientists have designed to bypass self-report. Rather than ask people to speak, these tests observe people’s bodies and behaviors — how quickly they push a button, how much they sweat, how quickly they breathe, or where the blood flows in their brains. These measures cost more time and money than self-reports, but they promise a different angle into people’s minds, granting access to answers that are difficult — if not impossible — to fake.

Greenwald’s election IAT was returning some surprising results. According to the test, those who had favored a Republican other than Donald Trump in the primary appeared to have negative associations with Trump, even if they claimed they planned to vote for him — that is, it took them longer to associate Trump with a positive word than it took for other candidates.. But people who had supported Bernie Sanders in the Democratic primary showed equally positive associations with Hillary Clinton. “When we used both polling-type questions and the IAT, it became instantly clear that spoken and unspoken measures are no longer in sync,” Greenwald said in a November 2016 interview. “The message of these findings is that Clinton has unspoken support that is likely being missed by the polls. I’m going out on a limb to predict that Clinton’s vote margin on November 8 will exceed the prediction of the final preelection polls.”

A short time later, of course, that limb broke. Trump exceeded predictions while Clinton underperformed them, and the rest is history. 

Why did Greenwald’s prophecy fail to come true? The whole reason to bypass self-report was to avoid the booby traps that lie between people’s minds and mouths. Methods like the IAT are  supposed to offer a glimpse into people’s heads, to uncover the attitudes and beliefs that people wouldn't explicitly admit — perhaps that they didn't even know they had. Instead of getting us closer to the truth, that glimpse led us even farther away. What gives?

Jordan Awan

If you wanna know if he loves you, it's in his pupil dilation

Social scientists have a complicated relationship with self-reports. On the one hand, we use self-reports all the time. Whenever a pharmaceutical company wants to prove that their drug decreases depression, eases anxiety, makes people feel less fatigued, or has any other psychoactive effect, they have to use self-report scales like the Beck Depression Inventory or the PHQ-9. The "longest study on human happiness," featured in a TED talk with over 45 million views, is based on people’s answers to questions like “How happy or unhappy did you feel in the past 24 hours? (1 = very unhappy, 7 = very happy).” The Big Five, the most widely used personality assessment, is a long list of self-report questions. 

And yet researchers often criticize self-reports and apologize for using them. Self-reports are supposedly riddled with error and biases: people lie, they tell you what they want to hear or what they want to be true, they can’t remember, they make things up, they give very different answers to slightly different questions. "It's just self-report" is one of the earliest criticisms that undergraduates learn to lob at the papers they read. Economists are especially skeptical. A review of self-report from two economists, now cited nearly 2,000 times, ends with: “a large experimental literature by and large supports economists' skepticism of subjective questions [...] these findings cast serious doubts on attempts to use subjective data as dependent variables.”

Much of this criticism takes place behind the scenes during peer review. While we can’t see the reviews themselves, we can see plenty of researchers acknowledging these criticisms in their published works. “One of the most common methodological criticisms of manuscripts under review tends to be associated with alleged problems concerning the use of self-report data,” reports one social scientist. “Many researchers are skeptical about results that come from questionnaires that ask people to report about themselves,” claims another. Others go even further: “It seems as if self-report-bashing might be an article of faith of some Scientific Apostle’s Creed, ‘I believe in good science; the empirical determination of theory choice, the control of extraneous variables, and the fallibility of self-report measures ...’” “Even authors themselves tend to accept the alleged problems of self-report data, as indicated by the limitations they acknowledged in the Discussion section of their manuscripts,” another adds.

Many of these criticisms are justified because asking self-report questions is indeed fraught. Asking questions in different orders or contexts can elicit different answers: “How much do you approve of Joe Biden’s job performance?” could take on a different meaning if you’ve just been asked, “How would you rate the state of the economy since Joe Biden took office?” There are cultural differences in how people interpret and respond to questions — for instance, people in the Middle East and Latin America may be more willing than people elsewhere to select “10” (the top option) when asked about their happiness. A landmark paper, now cited over 17,000 times, finds that people often can’t accurately explain their own thoughts and decisions and, when asked, may confabulate answers, even if there’s no way they could know those answers and even if the answers are demonstrably wrong.

This creates a strong incentive for researchers to invent ways of getting inside people’s heads without asking. The IAT is one of many different attempts at this. Psychologists use functional magnetic resonance imaging (fMRI), which tracks the flow of blood in the brain, to try to gain insight into people's mental states. They use physiological measures to try to measure people's feelings through their bodies: heart rate, pupil dilation, hormone levels. They use natural language processing to probe people's inner lives by counting and categorizing the words they use, rather than interpreting the meaning of those words: how many first-person pronouns, how many positive and negative adjectives, how many abstractions versus concrete details. And they use all manner of "unobtrusive" measures to collect data without bothering anyone: social media posts, politicians' speeches, eye contact.

These methods can raise interesting questions. For instance, why do people's pupils synchronize during conversation? Why would people who claim to hold egalitarian views be slower when putting "Black" and "good" together than they are when putting "white" and "good" together? Why do some people who say they intend to vote end up staying home on election day? 

But they’re not a replacement for self-reports, most often because they rely on self-reports for validation in the first place. How do we know, for example, the amygdala is involved in processing fear? Well, it lights up in the brain scanner when people say they're afraid. But if fMRI gives us the same answer that the self-report does, what does the scanner add? And if someone's fMRI results and self-reports don't match up, which one do we trust? 

Neuroscientists call this the “reverse inference” problem, and no one has a solution to it. We run into it every time self-reports and indirect measures diverge. For instance, one study tried to figure out whether men who score high on a homophobia scale are actually secretly homosexual by hooking up self-reported gay-hating men to a penile plethysmograph — a cuff that measures penile circumference — and having them watch gay porn. If a guy gets hard when he watches two men go at it, the thinking goes, then he must be gay. But if a man gets an erection when watching gay porn and yet claims he isn't attracted to men, do we believe his words or his genitals?

There’s no way around the reverse inference problem. But indirect measures can play a role in the scientific process, not by measuring something better but by measuring something different. Here’s an example. One way to measure people’s self-control is simply to ask them about it, perhaps using the appropriately named Self-Control Scale (example items: “I am good at resisting temptation,” “I am lazy”). Another way is to make them do a task that requires them to use self-control, like the Stroop task, where people have to read out the color of a word, rather than the word itself (i.e., the word “RED” printed in blue). Unfortunately, these two methods do not correlate with one another

Does this mean the Self-Control Scale is bunk, just a bunch of nonsense self-reports? No: People’s answers correlate with their performances at school and at work, their well-being, and their likelihood of smoking and overeating, which is not what you’d expect from meaningless responses. Does that mean that the Stroop task is nonsense instead, just a cute little game that doesn’t give any insight into anything else about someone? Also no: People who did better on the Stroop were more likely to complete addiction treatment in one study and adhered better to an exercise plan in another.

So neither self-report nor Stroop gives us the complete picture of self-control, or even more than a tiny sliver of that picture. But those slivers are unique. Their relationship is complementary, not supplementary. Their divergence is an opportunity to gain insight — what do we mean by “self-control” anyway? We use a single word to describe a whole host of complicated thoughts and behaviors, but that does not mean they’re all one thing, and having multiple measures can help us separate and study those components. 

Lies about lying

Why are people wary of self-reports? Perhaps the biggest reason is that people are afraid of being lied to. These lies come in different forms. Sometimes they’re examples of “social desirability,” or people telling you what they think makes them look good. Sometimes they are “demand characteristics,” or people telling you what they think you want them to say. Sometimes they are “expressive responding” or “partisan cheerleading,” or people exaggerating their responses to make a point. The worst of these lies, though, are the lies people tell you just for the hell of it. Whenever I talk to the public about my work, it’s these lies that concern them most often: “How do you know people aren’t just screwing with you?”

I see this unfounded fear up close every semester. When I taught Managerial Negotiations to M.B.A. students at Columbia, I ran an exercise where students could choose whether or not to lie to one another. The lies were simple: You just had to claim that you drew a high playing card when you actually drew a low one, or vice versa. The stakes were low: A successful lie got you more points, which then turned into lottery tickets for a gift card. Across multiple years of data from the exercise, students expected their classmates to lie to them about half the time, and they expected themselves to lie about as much. When they actually play the game, however, students lie only about 20% of the time.

This happens because lying looks a lot easier than it really is. It's easy to picture yourself saying "Two of clubs!" when you really drew a king of diamonds, but when the time comes to open your mouth and say exactly the wrong thing, it's not easy at all. Lying is also more mentally taxing than telling the truth because it requires you to maintain two versions of reality, like running Windows on your Mac. Practice seems to make lying easier, which is perhaps why most lies are told by a few prolific liars

People seem to have a special distaste for falling for a falsehood, and they're willing to burn resources to avoid it. One study done in six different countries showed that people prefer a lower chance of making money so long as it means nobody could lie to them. Researchers call this "betrayal aversion" — a willingness to take objectively worse odds to avoid the possibility of getting duped. This may make indirect measures seem especially attractive — even if we end up further from the truth, at least nobody intentionally misled us.

Perhaps our fear of falsehoods is also why we are so quick to assume that people will lie to us when we ask them questions. One of the first and most influential hypotheses that emerged to explain how Trump beat his polls was the "shy Trump voter" effect: People were afraid to admit that they intended to vote for him, the theory went, leading pollsters to underestimate Trump's support. Years later, there still isn't evidence for this hypothesis. When the polls are off, it's more often because it's hard to get people to talk to you in the first place and because it’s hard to make precise estimates about large populations from small samples, not because the people who do talk to you are lying.

Still, people lie. Fortunately, it’s possible to catch them with canny questions, and any study worth its salt will incorporate many lines of defense. For instance, I often exclude participants who report an age at the end of the study that’s inconsistent with the birth year they reported at the beginning, who can’t remember the answer they gave on the previous page or the instructions they just read, who give off-topic or copy-pasted answers to open-ended questions, who miss attention checks (“Some people are extroverted, and some people are introverted. Please don’t answer this question; just leave it blank”), or who pick wacky responses to easy questions (for instance, claiming that “eating turkey” is a Halloween tradition). Trolls usually fall into at least one of these traps, and usually several.

It’s possible to catch the other forms of lying with similar methods. “Partisan cheerleading” and “expressive responding” can sometimes be counteracted with incentives. For example, if you ask Democrats to estimate how many Republicans believe Donald Trump won the 2020 election, and they give lower estimates when paid to get the answer right than when they get nothing, you can be pretty sure that some of what the unincentivized Democrats were saying was simply “I don’t like Republicans.” You can combat social desirability by massaging the wording of a question to make it clear that all answers are acceptable: “Some people think it’s rude to make jokes about someone’s weight. Other people think it’s funny. What about you? What do you think?” Psychologists sometimes employ a “bogus pipeline,” a cover story that leads participants to believe that their lies can be detected, in order to encourage honest responses.

There’s no way to catch every single liar, but lying seems far simpler than it turns out to be and harder to get away with it than you might imagine — so long as you use the right methods. And so perhaps we fear being lied to on self-reports more than we really should.

Jordan Awan

How to spend four and a half years on commas

That doesn’t mean self-reports are simply wonderful and we should use them all the time. Self-reports can be powerful and useful, and they can also be useless and misleading. Deploying them well requires care and skill, far more than you might expect. 

My dad was a photojournalist for a long time, and as consumer-grade digital cameras flooded the market in the early 2000s, he often despaired at people’s lack of respect for photography. “Click, I got the shot!,” he would say, imitating someone taking a picture without thinking. In his view, making the technology easy to use obscured the artistry required to use it. Photography wasn’t just about making a picture appear on a screen; it was about making a good picture appear, and that required experience and insight. 

I feel the same way about self-report that my dad felt about cameras. The fact that you can dump some questions into a Google Form doesn’t mean you’ve mastered the art of probing the human mind. Designing a useful self-report measure takes skill and practice, and the ease of making something appear obscures the difficulty of making something good appear. 

That’s why, of the five years I spent doing a Ph.D. in social psychology, probably a full four and a half were spent sitting in my advisor’s office tweaking self-report questions. In our first project together, we wanted to know whether conversations tend to end when people want them to, which required, of course, asking people when they wanted their conversations to end. That sounds simple, but it took us weeks to develop the right question. 

If you ask someone, "When did you want your conversation to end?," do you mean something like "When do you wish your conversation had been cut off?" or "When do you wish you would have ended the conversation?" or "When do you wish the other person would have ended the conversation?" If you wrap things up five minutes earlier than you would have preferred because you think your partner is ready to leave, did you "want" to go then, or did you "want" to go five minutes later? We also thought "want" might be too strong; maybe people hesitate to say they wanted to leave their partners behind.

We ended up with a two-part question. First: "In the conversation you just had, was there any point at which you felt ready for the conversation to end?" If participants responded "Yes," we then asked them, “Using the options below, please find the point in the conversation when you first felt ready for the conversation to end" and gave them a clickable box that corresponded to each minute of their conversation. If participants responded "No," we then asked them, “How much longer would you have preferred the conversation continued?” and gave them options that ranged from "One minute longer" to "More than 60 minutes longer."

(At first we didn't even allow participants to tell us that they wanted their conversations to continue, figuring that anyone who had stopped talking obviously wanted to stop talking. We only added that option just before launching the study, and good thing we did, because about a quarter of participants ended up choosing it.)

No question is perfect. When I showed my results to colleagues, they pointed out that it's possible to "feel ready" for a conversation to end and then to stop feeling that way — maybe an interesting topic comes up, and you suddenly feel like sticking around again. So we ran another study where we added another question: “Please complete this sentence in the way that best reflects your opinion: After I first felt ready for conversation to end ..." followed by two options, “I continued to feel that way for most of the remaining part of the conversation” and “The feeling went away — for most of the remaining part of the conversation, I didn't feel ready for the conversation to end.” To our great relief, 92% of participants picked the first option.

I'm dragging you through all these details to show you that self-reports ain't easy. You have to puzzle through the things you can and can't conclude from what participants tell you, game out all the ways someone could misinterpret your question, and ask yourself whether someone could possibly know the answer to the question you're asking, and a million other things besides. You can’t skip all of this work, not even if you pop someone into an fMRI machine or stick their head into a gaze-tracker.

With apologies to Gallup

There’s no way to distill the art of asking a question into a set of principles, just as there is no instruction manual for painting a great portrait or writing a timeless song. There is, however, a common failure mode.

I discovered this when, for another research project, I had to read more than 10,000 self-report questions over the course of a year. I was looking for survey questions that tapped people’s perceptions of changes in moral traits, values, and behaviors — do people think that humans have become less kind, honest, nice, good, etc.? Fortunately, polling companies have asked hundreds of questions about this over the past 70 years. Unfortunately, those questions were spread across databases and poorly indexed, with no set of keywords that would return them all. 

So I had to sift through them by hand, and in the process I became well-acquainted with the way they most often went wrong, which was by violating this maxim: Strive to ask questions that all participants interpret in the same way. There are all sorts of ways not to do this; it’s remarkably easy. For instance, here’s one question I came across: 

“Do you think people in general today lead as good lives — honest and moral — as they used to?”

When, exactly, is “used to”? Some people may be thinking about 50 years ago, and other people may be thinking about 20 years ago. The people who are thinking about 50 years ago might give a different answer than the people thinking about 20 years ago might give, but if you asked them both about the same year, they might agree on their answer. 

Here’s another:

“Do you think about moral values in this country today are very strong, somewhat strong, somewhat weak, or very weak”

What moral values are we talking about? I might think “tolerance!” and you might think “honesty!,” and our different answers may merely indicate different definitions.

I could go on for another 10,000 questions. These are surveys designed by professionals; presumably they are workshopped and edited, and still they often go wrong. Like I said: Self-reports are hard!

Tell me what you really think

It’s easy to lump together both useful and useless questions under the label of “self-report” because it’s so hard to see the differences between them. Writing a good question is difficult, and spotting one takes time and reflection, so it’s not immediately obvious that some questions are useful and some are not. 

This isn’t helped by the fact that scientific papers often bury the verbatim wording of critical questions in supplements and appendices, nor is it helped by the fact that researchers are often expected to simply intuit their way into writing self-reports. My Ph.D. required me to take four different statistics classes; it required zero classes on question formulation. My undergraduate methods class — and those I later TA’d as a graduate student — similarly skipped the topic in favor of stats and experimental design. Those tools matter too, of course, but there’s no point in designing a study or in analyzing the data if the measures it uses are meaningless. The only reason I got trained in the arts of self-report was that I ended up with an advisor who was happy to spend weeks swapping out words and tweaking commas; most advisors aren’t.

The solution to all of this is not to eschew self-report questions entirely. While physiological measures, reaction times, eye tracking, and the like might be useful for triangulating the truth, they are not a back door into the mind. If you say otherwise, well, perhaps I shouldn’t believe you until you complete an IAT, climb into an fMRI tube, and strap a plethysmograph onto your privates. Only then will I know what you really think.

  1. Purely factual questions like “How many siblings do you have?” and “Do you own a car?” are also technically self-reports, but researchers generally trust people’s answers to these kinds of questions, so they aren’t as interested in finding ways to avoid them. The self-reports that make social scientists sweat are descriptions of unverifiable internal states — thoughts, attitudes, and beliefs, like “How satisfied are you with your life these days?” and “How warm do you feel toward Black Americans?”

Adam Mastroianni is an experimental psychologist. He writes Experimental History.

Published December 2024

Have something to say? Email us at letters@asteriskmag.com.

Further Reading

Subscribe