PRD-013: Grok Provider Integration (xAI)¶
- Status: ✅ Implemented (v2.5.0)
- Revision: 3
- Date: 2026-02-05
- Implementation: Issue #1095
- Related RFCs: RFC-036 (Updated)
- Related PRDs: PRD-006 (OpenAI), PRD-011 (DeepSeek)
Summary¶
Add Grok (by xAI) as an optional provider for speaker detection and summarization capabilities. Grok is xAI's AI model, available through their API. Like OpenAI, Grok uses a unified provider pattern where a single GrokProvider class implements both capabilities. Like Anthropic and DeepSeek, Grok does NOT support audio transcription (xAI focuses on text-based LLMs).
Note: API details have been researched based on xAI's public API documentation and common OpenAI-compatible API patterns.
Background & Context¶
Grok (xAI) offers several advantages:
- xAI's AI Model: Grok is xAI's proprietary AI model (Elon Musk's AI company)
- Real-time Information: Grok has access to real-time information via X/Twitter integration
- OpenAI-Compatible API: Uses OpenAI-compatible API format (can reuse OpenAI SDK with custom base_url)
- Competitive Pricing: Competitive pricing compared to other providers
- Public API: Public API available at
https://api.x.ai/v1(verify with your API key)
API Details (Verified/Assumed):
- Base URL:
https://api.x.ai/v1(OpenAI-compatible endpoint) - SDK: Uses OpenAI SDK with custom
base_url(no new dependency) - Authentication: API key via
GROK_API_KEYenvironment variable - Model names:
grok-beta(beta/development),grok-2(production) - verify with your API access - Context window: Likely 128k tokens (verify with API documentation)
This PRD addresses adding Grok as an xAI-based provider.
Goals¶
- Add Grok as provider option for speaker detection and summarization
- Maintain 100% backward compatibility
- Follow unified provider pattern (like OpenAI) - single class implementing both protocols
- Provide secure API key management via environment variables and
.envfiles - Support both Config-based and experiment-based factory modes from the start
- Handle capability gaps gracefully (no transcription)
- Use OpenAI SDK with custom base_url if API is OpenAI-compatible (needs verification)
- Use environment-based model defaults (test vs production)
Grok Model Selection and Cost Analysis¶
⚠️ Note: Model names, pricing, and API details below are placeholders and need verification from xAI's official documentation.
Configuration Fields¶
Add to config.py when implementing Grok providers (following OpenAI pattern):
# Grok API Configuration
grok_api_key: Optional[str] = Field(
default=None,
alias="grok_api_key",
description="Grok API key (prefer GROK_API_KEY env var or .env file)"
)
grok_api_base: Optional[str] = Field(
default=None,
alias="grok_api_base",
description="Grok API base URL (default: https://api.x.ai/v1, for E2E testing)"
)
# Grok Model Selection (environment-based defaults, like OpenAI)
grok_speaker_model: str = Field(
default_factory=_get_default_grok_speaker_model,
alias="grok_speaker_model",
description="Grok model for speaker detection (default: environment-based)"
)
grok_summary_model: str = Field(
default_factory=_get_default_grok_summary_model,
alias="grok_summary_model",
description="Grok model for summarization (default: environment-based)"
)
# Shared settings (like OpenAI)
grok_temperature: float = Field(
default=0.3,
alias="grok_temperature",
description="Temperature for Grok generation (0.0-2.0, lower = more deterministic)"
)
grok_max_tokens: Optional[int] = Field(
default=None,
alias="grok_max_tokens",
description="Max tokens for Grok generation (None = model default)"
)
# Grok Prompt Configuration (following OpenAI pattern)
grok_speaker_system_prompt: Optional[str] = Field(
default=None,
alias="grok_speaker_system_prompt",
description="Grok system prompt for speaker detection (default: grok/ner/system_ner_v1)"
)
grok_speaker_user_prompt: str = Field(
default="grok/ner/guest_host_v1",
alias="grok_speaker_user_prompt",
description="Grok user prompt for speaker detection"
)
grok_summary_system_prompt: Optional[str] = Field(
default=None,
alias="grok_summary_system_prompt",
description="Grok system prompt for summarization (default: grok/summarization/system_v1)"
)
grok_summary_user_prompt: str = Field(
default="grok/summarization/long_v1",
alias="grok_summary_user_prompt",
description="Grok user prompt for summarization"
)
Environment-based defaults:
- Test environment:
grok-beta(beta model, typically available for development) - Production environment:
grok-2(production model, best quality)
Note: Verify actual model names with your xAI API access. Common patterns suggest grok-beta and grok-2, but model availability may vary.
Model Options and Pricing¶
⚠️ Note: Pricing information should be verified from xAI's official documentation at https://console.x.ai or https://docs.x.ai. The following are estimates based on common pricing patterns.
| Model | Input Cost | Output Cost | Context Window | Speed | Best For |
|---|---|---|---|---|---|
| grok-2 | Verify pricing | Verify pricing | 128k (verify) | Medium | Production |
| grok-beta | Verify pricing | Verify pricing | 128k (verify) | Medium | Dev/Test |
Source: Verify current pricing at https://console.x.ai or https://docs.x.ai. Pricing may vary based on your account tier.
Free Tier Limits¶
⚠️ Needs Verification: Free tier availability and limits should be verified from xAI documentation at https://console.x.ai.
| Model | Requests/Min | Tokens/Min | Requests/Day |
|---|---|---|---|
| Verify with API | Verify with API | Verify with API | Verify with API |
Note: Check your xAI account dashboard for current rate limits and free tier availability.
Dev/Test vs Production Model Selection¶
Note: Verify model names with your xAI API access. Common patterns suggest these names, but availability may vary.
| Environment | Speaker Model | Summary Model | Rationale |
|---|---|---|---|
| Dev/Test | grok-beta |
grok-beta |
Beta model for development/testing |
| Production | grok-2 |
grok-2 |
Production model, best quality |
Cost Comparison: All Providers (Per 100 Episodes)¶
⚠️ Note: Grok pricing should be verified from xAI documentation. Estimates based on common pricing patterns.
| Component | OpenAI (gpt-4o-mini) | DeepSeek (chat) | Grok (verify) |
|---|---|---|---|
| Transcription | $0.60 | ❌ N/A | ❌ N/A |
| Speaker Detection | $0.14 | $0.004 | Verify pricing |
| Summarization | $0.41 | $0.012 | Verify pricing |
| Total Text | $0.55 | $0.016 | Verify pricing |
Processing Time Comparison (Single Episode Summary)¶
⚠️ Note: Grok performance metrics should be verified through actual API testing.
| Provider | Time | Tokens/Second |
|---|---|---|
| OpenAI GPT-4o-mini | ~5 seconds | 100 |
| Anthropic Claude | ~5 seconds | 100 |
| DeepSeek | ~3 seconds | 150 |
| Grok | Verify | Verify |
Non-Goals¶
- Transcription support (Grok/xAI focuses on text-based LLMs, no audio models)
- Changing default behavior
- Self-hosted Grok (not available)
Personas¶
- Real-Time Rita: Needs access to real-time information via Grok's X/Twitter integration
- xAI Enthusiast: Prefers xAI's Grok model over other providers
- Batch Processing Brenda: Needs to process many episodes efficiently
User Stories¶
- As Real-Time Rita, I can use Grok to leverage real-time information in summaries.
- As xAI Enthusiast, I can use Grok as my preferred AI provider.
- As Batch Processing Brenda, I can process episodes using Grok's API.
Functional Requirements¶
FR1: Provider Selection¶
- FR1.1: Add
"grok"as valid value forspeaker_detector_provider - FR1.2: Add
"grok"as valid value forsummary_provider - FR1.3: Attempting transcription with Grok results in clear error
- FR1.4: Default values maintain current behavior
- FR1.5: Support both Config-based and experiment-based factory modes from the start
FR2: API Key Management¶
- FR2.1: Support
GROK_API_KEYenvironment variable (likeOPENAI_API_KEY) - FR2.2: Support
.envfile viapython-dotenvfor convenient configuration - FR2.3: Clear error on missing API key
- FR2.4: Support
GROK_API_BASEenvironment variable for E2E testing (likeOPENAI_API_BASE)
FR3: Speaker Detection with Grok¶
- FR3.1: Grok provider uses xAI's API for entity extraction
- FR3.2: Maintains same interface as other providers
- FR3.3: Uses Grok-specific prompt templates
FR4: Summarization with Grok¶
- FR4.1: Grok provider uses xAI's API for summarization
- FR4.2: Leverages Grok's context window (size TBD)
- FR4.3: Maintains same interface
FR5: API Compatibility¶
- FR5.1: Use OpenAI SDK with custom
base_urlif Grok API is OpenAI-compatible (needs verification) - FR5.2: Alternative: Use xAI SDK if available (needs research)
- FR5.3: Minimize new dependencies
Technical Requirements¶
TR1: Architecture¶
- TR1.1: Follow unified provider pattern (like OpenAI) - single class implementing both protocols
- TR1.2: Create
providers/grok/grok_provider.pywith unifiedGrokProviderclass - TR1.3:
GrokProviderimplementsSpeakerDetectorandSummarizationProviderprotocols - TR1.4: Update factories to include Grok option with support for both Config-based and experiment-based modes
- TR1.5: Create
prompts/grok/directory with provider-specific prompt templates - TR1.6: Use OpenAI SDK with custom base_url if API is OpenAI-compatible, or xAI SDK if available
- TR1.7: Follow OpenAI provider architecture exactly for consistency
TR2: Dependencies¶
- TR2.1: Prefer reusing existing
openaipackage if Grok API is OpenAI-compatible - TR2.2: Alternative: Use xAI SDK if available (needs research)
- TR2.3: Minimize new dependencies
Success Criteria¶
- ✅ Users can select Grok provider for speaker detection and summarization via unified provider
- ✅ Clear error when attempting transcription with Grok
- ✅ API integration works (OpenAI-compatible API at https://api.x.ai/v1)
- ✅ Real-time information access via X/Twitter integration
- ✅ Environment-based model defaults (test vs production)
- ✅ Both Config-based and experiment-based factory modes supported
- ✅ No new SDK dependency (uses OpenAI SDK)
- ✅ E2E tests pass
- ✅ Follows OpenAI provider pattern exactly for consistency
Provider Capability Matrix (Updated)¶
| Capability | Local | OpenAI | Anthropic | Mistral | DeepSeek | Gemini | Grok |
|---|---|---|---|---|---|---|---|
| Transcription | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ |
| Speaker Detection | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Summarization | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Real-time Info | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ (via X/Twitter) |
Future Considerations¶
- Audio transcription support (if xAI adds audio models)
- Tool use / function calling
- Streaming responses for real-time display
- Enhanced real-time information integration