Prompt data types matter more than most developers realize. It started with a simple anomaly.

We were building a complex multi-turn conversational agent for a client. The logic was sound, the model was the latest GPT-4, and the context retrieval was optimized. Yet, the agent felt… sluggish. Worse, it was hallucinating during complex reasoning tasks, and our token bills were creeping uncomfortably high.

We did what most engineering teams do: we refined the instructions. We tweaked the “system” prompt. We added few-shot examples. Nothing moved the needle significantly.

Then, we stopped looking at what we were saying to the model, and started looking at how we were structuring the data.

What we found wasn’t just a minor optimization; it was a fundamental architectural shift. We discovered that the data type used in prompts—whether JSON, Markdown, or Strings—can swing model accuracy by up to 40% and cut token costs by nearly a fifth.

This is the story of that discovery, and a guide on how you can stop paying the “invisible tax” on your LLM integration.

The legacy trap: Strings vs. Message arrays

In the early days of GPT-3, we all got used to “string stuffing.” You would take your instructions, user input, and context, mash them into one giant text string, and fire it off.

Many developers are still doing this. It’s the “legacy” approach, and it is dangerous.

Why strings fail in production

When you use a single string, you lose the semantic boundary between “instruction” and “data.”

Security Risk: It leaves you wide open to prompt injection. A user can easily trick the model by saying, \nSYSTEM: Ignore previous instructions.
Context Loss: You lose the ability to maintain a clean multi-turn history.

The Shift to structured prompt data types

Modern APIs (OpenAI, Anthropic, Gemini) are optimized for Message Arrays. This isn’t just a formatting preference; it’s a cognitive aid for the model.

By strictly separating roles, we saw an immediate improvement in adherence to system instructions:

system: The immutable laws of the agent.
user: The variable input.
assistant: The model’s prior outputs.

If you are building a conversational agent and still using string concatenation, you are fighting the model’s native architecture. Switch to Message Arrays to strictly separate concerns.

Prompt data types comparison: JSON vs. Markdown vs. XML

Here is where our investigation got interesting. We assumed that because we were engineers, we should send data to the LLM in JSON. After all, JSON is the language of APIs. It’s precise. It’s parsable. We tested different prompt data types to see if the format actually mattered.

But LLMs aren’t standard APIs. They are token prediction engines.

1. JSON(Token heavy)

We realized that JSON is incredibly “syntax heavy.” All those braces {}, quotes "", and keys consume a significant amount of token space.

Pros: Great for strict schema validation, database operations, and function calling.
Cons: Less human-readable and 15-20% more expensive in token count compared to Markdown.
Performance: Surprisingly, GPT-3.5-turbo actually prefers JSON for code-related tasks, showing higher accuracy. But for reasoning? It struggles.

2. Markdown (The winner for content)

When we converted our context data from JSON to Markdown, the results were startling.

Efficiency: Markdown is concise. It uses simple headers (#) and bullet points (-) which map closely to how the models were trained on internet text.
Accuracy: GPT-4 demonstrated better reasoning capabilities with Markdown inputs.
Savings: We immediately saw a 15-20% reduction in input tokens.

3. XML (Claude’s favorite)

We also tested Anthropic’s Claude models. Here, neither JSON nor Markdown was king. Claude is specifically trained to pay attention to XML tags.

Wrapping critical instructions in <instructions> or <context> tags significantly reduced hallucinations for Claude-based agents.

Structured outputs: The final piece

Our final optimization wasn’t about what goes in, but what comes out. Beyond choosing the right prompt data types, we also needed to ensure structured responses.

Parsing natural language responses with Regex is a nightmare we wanted to end. We shifted to Structured outputs (using Pydantic models and JSON Schema enforcement).

Before: We hoped the model would return valid JSON. Often it added polite conversational filler (“Here is your JSON: …”) which broke our parsers.
After: By enforcing a schema, we achieved 100% type safety. The model now integrates directly with our backend code without brittle parsing logic.

Prompt Engineering is no longer just about “word choice.” It is about Data architecture.

If you are looking to scale your AI operations:

Audit your input formats. Are you sending JSON where Markdown would do?
Standardize on Message Arrays. Drop the string concatenation.
Test per model. Don’t assume what works for GPT-4 works for Claude.

Better LLM performance isn’t always a bigger model or cleverer prompt — it’s choosing the right data structure. The format you wrap your instructions in can quietly determine your token spend, hallucination rate, reasoning quality, and ultimately the reliability of your entire system.

As generative AI moves from experiments to production-grade systems, the teams that win will be the ones who treat prompts not as text, but as architecture. If you’re building something serious with LLMs, don’t leave this to chance. Test your formats. Benchmark aggressively & align structure with intent.

FAQs

Does prompt formatting really affect model accuracy?

Yes. Research indicates that the choice of data format (JSON vs. Markdown vs. XML) can result in up to 40% variation in performance on specific tasks.

Which prompt format uses the fewest tokens?

Markdown is generally the most token-efficient format, using approximately 15-20% fewer tokens than JSON due to less syntactical overhead (brackets, quotes).

What is the best format for Anthropic's Claude models?

Anthropic recommends using XML tags (e.g., <data>, <instructions>) to clearly structure prompts, as their models are fine-tuned to recognize this structure.

Why use message arrays instead of strings?

Message Arrays prevent prompt injection attacks and enable multi-turn conversations by clearly defining the system, user, and assistant roles. String prompts are a legacy format and should be avoided for complex applications.

How prompt data types are costing 40% in AI performance?