🤔 Does Being Polite to AI Actually Help? Research Says… Maybe Not

I recently read a short research paper from Penn State that tested something many of us assume by default:

Does being polite to an AI actually improve its answers?

Turns out, the answer is… not really. And in some cases, the opposite.

:test_tube: What they tested (quick version)

  • Model: ChatGPT-4o

  • Questions: 50 multiple-choice questions
    (math, science, history, medium to hard)

  • Total prompts: 250
    (each question rewritten 5 times with different tone)

  • Tone levels tested:
    :innocent: Very Polite → :relieved_face: Polite → :neutral_face: Neutral → :angry: Rude → :face_with_symbols_on_mouth: Very Rude

  • Runs: 10 per tone

  • Metric: Accuracy only (right or wrong)

Everything else stayed the same. Only tone changed.

:bar_chart: Results at a glance

Accuracy increased as prompts became more direct and rude.

😇 Very Polite  🟦🟦🟦🟦🟦🟦🟦🟦🟦⬜️  80.8%
😌 Polite       🟦🟦🟦🟦🟦🟦🟦🟦🟦⬜️  81.4%
😐 Neutral      🟨🟨🟨🟨🟨🟨🟨🟨🟨⬜️  82.2%
😠 Rude         🟧🟧🟧🟧🟧🟧🟧🟧🟧⬜️  82.8%
🤬 Very Rude    🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥  84.8% (highest)

Paired statistical tests confirmed the differences were real, not noise.

:brain: What might be going on?

A few reasonable explanations:

  • Shorter, blunter prompts may reduce ambiguity

  • Extra politeness words might add noise without adding clarity

  • Newer models may treat tone as tokens, not intent

  • Direct instructions help the model focus on the task faster

Interesting fact:
Earlier studies on GPT-3.5 showed rudeness hurting performance. This study suggests newer models behave differently.

:warning: Important ethical note (this matters)

The authors are very clear:

This is not a recommendation to be rude to AI or build hostile user experiences.

Toxic language:

  • Hurts user experience

  • Normalizes bad communication

  • Creates accessibility and inclusivity issues

The real takeaway is prompt clarity, not prompt aggression.

:puzzle_piece: Practical takeaway for Pickaxe builders

  • Over-politeness does not improve accuracy

  • Neutral, direct prompts are often a sweet spot

  • Flowery language rarely helps task performance

  • Prompt tone is not just UX, it affects results

Curious if others here have noticed similar patterns while testing prompts or building tools.

Disclaimer: This is an informational post sharing observations from published research. I’m not endorsing or opposing the findings, just presenting them to encourage discussion and awareness.

2 Likes

I have arguments with AI and usually it works out better. If I am getting poor responses again and again I tell it something like you are not listening to me and not doing what I’m asking, it isn’t rocket science, just read my instructions correctly and follow them so that you are giving an accurate response. It will then apologise and give much better responses most of the time.

:grinning_face_with_smiling_eyes: Same here. I talk to AI the way I talk to a stubborn toddler. Short sentences, clear rules, and a “no, that’s not what I asked” tone…somehow, that’s when it finally listens. :smiley:

1 Like