DebateTalk is a multi-model AI debate platform that runs structured debates between leading AI models (Claude, GPT, Gemini, Grok, DeepSeek, and others) to surface stronger, more trustworthy answers to high-stakes questions.

How does the debate protocol work?

DebateTalk runs a 6-phase structured protocol: (1) question classification, (2) blind round where models answer independently, (3) deliberation rounds where models revise their reasoning, (4) consensus check by an adjudicator, (5) synthesis into a 4-part output (Strong Ground, Fault Lines, Blind Spots, Your Call), and (6) FREE-MAD accuracy evaluation.

Is DebateTalk free to use?

Yes. The Free plan allows 5 debates per day with up to 3 debaters and 2 rounds, with no credit card required. Pro and Enterprise plans are available for higher usage and API access.

Does DebateTalk have an API?

Yes. A REST API is available at https://engine.debatetalk.ai for Pro and Enterprise accounts. API keys are created in the dashboard. Full documentation is at https://debatetalk.ai/resources/api-reference.

Which AI models does DebateTalk support?

DebateTalk supports Claude Opus, Claude Sonnet, GPT-5.4, GPT-4o, Gemini 2.5 Pro, Gemini 2.5 Flash, Grok 3, DeepSeek R1, DeepSeek V3, Mistral Medium, Mistral Small, Llama 3.3, and others. Model selection can be automatic (smart routing) or manual.

Is DebateTalk EU AI Act compliant?

Yes. DebateTalk is designed to be EU AI Act ready. SOC 2 and HIPAA BAA are available on Enterprise plans. Ephemeral Mode (Enterprise) stores zero debate content server-side.

Documentation

Models

Available models, domain scoring, roles, and Auto mode selection logic.

Overview

DebateTalk supports models from Anthropic (Claude family), OpenAI (GPT family), Google (Gemini family), Mistral, DeepSeek, Groq (fast inference), and others via OpenRouter. Each model is characterized by its strengths across six domains and assigned to one of three roles in a debate: debater, synthesizer, or adjudicator.

You can configure models manually or let Auto mode select the best panel for your question. The two approaches can be mixed: you might fix the synthesizer to a specific model while letting Auto mode choose the debaters, or you might specify a preferred debater set and let Auto mode fill the adjudicator slot.

ℹ

Model configuration is managed through your dashboard under Model Config, or programmatically via GET /v1/user/model-config and PUT /v1/user/model-config. Changes take effect on the next debate you run.

Model Domains

Every model has a performance profile across six question domains. These scores drive Auto mode selection and are updated continuously using real debate performance data. A model that consistently produces high-accuracy answers on prediction questions will score higher in the Prediction domain over time. Auto mode uses these profiles to assemble panels where each debater is strong in the relevant domain.

The six domains are:

Factual

Questions with objectively verifiable answers: scientific facts, historical events, technical specifications. A model that scores well in Factual demonstrates strong grounding in its training data, calibrated uncertainty (it knows what it does not know), and precise language that avoids overgeneralizing from limited evidence.

Normative

Questions about what ought to be: ethics, policy, law, social values. Strong Normative performance requires awareness of competing ethical frameworks, an ability to hold multiple value systems in tension without collapsing them, and genuine nuance on contested values rather than defaulting to vague consensus language.

Business

Questions involving commercial strategy, organizational decisions, and market dynamics. A high Business score reflects structured risk analysis, sound commercial judgment, and realistic awareness of organizational constraints. Models that score well here tend to reason about tradeoffs in terms of incentives, resources, and competitive positioning.

Prediction

Questions about future outcomes and probabilistic assessments. Prediction performance is characterized by calibrated probabilistic thinking, awareness of historical base rates, and resistance to overconfidence. A strong prediction model does not just identify the most likely scenario. It assigns credible probability ranges and acknowledges the scenarios where it could be wrong.

Brainstorm

Open-ended questions that benefit from creative divergence and idea generation. High Brainstorm scores reflect a willingness to explore unconventional angles, prioritize breadth over depth in the early stages of a debate, and surface possibilities that more conservative models would discard as unlikely or unorthodox.

Belief

Questions about personal conviction, worldview, and meaning. Belief questions are distinct from Normative questions in that they are less about what society should do and more about what individuals find true or meaningful. Strong Belief performance requires philosophical rigor, genuine empathy for worldview differences, and the ability to reason coherently from first principles rather than deferring to consensus.

Auto Mode

When auto_mode is set to true in your model configuration, DebateTalk classifies the question at the start of the debate and selects the highest-scoring models for that domain. The classification step is fast and runs before the first debater round begins.

The selection algorithm enforces provider diversity: it limits how many models from the same provider can appear in a single panel. This ensures the debate benefits from genuinely different training backgrounds rather than minor variations of the same base model. A panel composed entirely of GPT variants, for example, would produce less productive disagreement than a mixed panel.

You can preview which models Auto mode would select for a given question before committing to a debate. Send a request to the model recommendation endpoint (POST /v1/models/recommend) with your question text and optional domain override. The response includes the recommended panel with per-model domain scores for the detected domain.

ℹ

Auto mode will never select a model that is currently marked unhealthy. If the highest-scoring model for a domain is unavailable, Auto mode falls back to the next best available model and includes a model_substituted field in the response from the recommendation endpoint.

Debater Role

Debaters are the primary reasoning models in a debate. They answer the question independently in the first (blind) round, revise their positions during deliberation rounds after seeing each other's arguments, and continue arguing until the adjudicator detects consensus or the maximum round limit is reached.

When selecting debaters manually, consider the question type and choose models with complementary strengths. For a normative question about AI regulation, pairing a model strong in Normative reasoning with one strong in Business gives the debate both ethical depth and practical grounding. For a prediction question about macroeconomic trends, pairing a Prediction-strong model with a Factual-strong one ensures the debate covers both probabilistic forecasting and empirical grounding.

The number of debaters you can configure depends on your plan (see Plans and Limits). More debaters produce richer debates with more perspectives represented, but each additional debater increases both latency and cost proportionally.

⚠

Configuring two debaters from the same provider with similar domain profiles often produces low-quality debates. The models tend to converge quickly without surfacing the disagreements that make multi-model debate valuable. Auto mode is designed to avoid this pattern automatically.

Synthesizer Role

The synthesizer writes the final answer once the adjudicator determines that consensus has been reached. It receives the full debate transcript, including all blind-round answers, deliberation arguments, and the adjudicator's consensus assessment, then distills the debate into a single clear, well-structured response.

Because the synthesizer reads all positions before writing, it tends to produce more balanced and complete answers than any individual debater would. It is not simply averaging the debaters' outputs. It is identifying where positions converged, where they remained in tension, and how to represent both honestly in the final answer.

A good synthesizer should be strong at following complex instructions, integrating information from long contexts, and writing clearly. Models like Claude Sonnet and GPT models in the analytical family are well suited to this role. The synthesizer is fully configurable and defaults to a strong general-purpose model when auto_mode is enabled.

Adjudicator Role

The adjudicator evaluates consensus after each deliberation round and scores debater accuracy at the end of the debate. It does not participate in the debate itself. Its two responsibilities are separate: during the debate it decides whether the debaters have reached sufficient agreement to proceed to synthesis, and after synthesis it assigns per-debater accuracy scores based on how well each debater's positions held up against the synthesized answer.

On Free accounts, consensus evaluation uses an algorithmic approach based on response similarity metrics rather than a model. Accuracy scores are not available on the Free plan. On Pro and Enterprise accounts, you can configure any supported model as the adjudicator. A model adjudicator produces richer consensus assessments with per-dimension scores and qualitative feedback explaining why it judged consensus as reached or not.

The adjudicator should be a model with strong analytical and evaluative capabilities, particularly strong instruction-following and judgment. Gemini 2.5 Flash is the default adjudicator in Auto mode. Claude Opus and GPT-4o are also well suited to the adjudicator role.

ℹ

The adjudicator reads full debate transcripts that can be long. Models with larger context windows are preferred for the adjudicator role, particularly when debates run for many rounds or involve multiple debaters with long responses.

Available Models

The table below lists the models currently available on DebateTalk. Context window sizes refer to the combined input context the model can process, which affects how many debate rounds the adjudicator and synthesizer can handle before hitting limits. The Min Plan column indicates the minimum subscription tier required to use the model.

Model	Provider	Context Window	Min Plan
Claude Opus	Anthropic	200k	Pro
Claude Sonnet	Anthropic	200k	Free
GPT-5.4	OpenAI	128k	Pro
GPT-4o	OpenAI	128k	Free
Gemini 2.5 Pro	Google	1M	Pro
Gemini 2.5 Flash	Google	1M	Free
Mistral Medium	Mistral	128k	Free
Mistral Small	Mistral	32k	Free
DeepSeek R1	DeepSeek	128k	Pro
DeepSeek V3	DeepSeek	128k	Free
Llama 3.3 70B	Groq	128k	Free

The table above covers the main models available at the time of writing. New models are added regularly. For the complete live list including real-time health, latency, and uptime for every active model, see the Model Status page. The full list including domain score profiles and per-model pricing is also available programmatically via GET /v1/user/model-config.

Model Health

All models are monitored continuously. The public model status endpoint (GET /v1/public/model-status) returns real-time health data for every model in the system. The response includes whether each model is currently healthy, its current response latency, its 24-hour average latency, and its 24-hour uptime percentage.

If a model becomes unhealthy during an active debate, it may be skipped for that round. When this happens, a model_skipped SSE event is emitted with the affected model name and the reason. The debate continues with the remaining debaters. A skipped model in the debater role reduces the panel size for that round but does not terminate the debate.

Auto mode checks model health before assembling a panel and will not select a model that is currently marked unhealthy. If health status changes between panel selection and the start of a debate round, the model_skipped event handles the fallback at runtime.

The /models page on the DebateTalk website shows a live model status dashboard updated every 30 seconds, with current health indicators, latency trends, and uptime history for each model.

ℹ

Model health data from GET /v1/public/model-status does not require authentication. You can poll this endpoint freely to check model availability before initiating debates in your application.