Skip to content

PRD-013: Grok Provider Integration (xAI)

  • Status: ✅ Implemented (v2.5.0)
  • Revision: 3
  • Date: 2026-02-05
  • Implementation: Issue #1095
  • Related RFCs: RFC-036 (Updated)
  • Related PRDs: PRD-006 (OpenAI), PRD-011 (DeepSeek)

Summary

Add Grok (by xAI) as an optional provider for speaker detection and summarization capabilities. Grok is xAI's AI model, available through their API. Like OpenAI, Grok uses a unified provider pattern where a single GrokProvider class implements both capabilities. Like Anthropic and DeepSeek, Grok does NOT support audio transcription (xAI focuses on text-based LLMs).

Note: API details have been researched based on xAI's public API documentation and common OpenAI-compatible API patterns.

Background & Context

Grok (xAI) offers several advantages:

  • xAI's AI Model: Grok is xAI's proprietary AI model (Elon Musk's AI company)
  • Real-time Information: Grok has access to real-time information via X/Twitter integration
  • OpenAI-Compatible API: Uses OpenAI-compatible API format (can reuse OpenAI SDK with custom base_url)
  • Competitive Pricing: Competitive pricing compared to other providers
  • Public API: Public API available at https://api.x.ai/v1 (verify with your API key)

API Details (Verified/Assumed):

  • Base URL: https://api.x.ai/v1 (OpenAI-compatible endpoint)
  • SDK: Uses OpenAI SDK with custom base_url (no new dependency)
  • Authentication: API key via GROK_API_KEY environment variable
  • Model names: grok-beta (beta/development), grok-2 (production) - verify with your API access
  • Context window: Likely 128k tokens (verify with API documentation)

This PRD addresses adding Grok as an xAI-based provider.

Goals

  • Add Grok as provider option for speaker detection and summarization
  • Maintain 100% backward compatibility
  • Follow unified provider pattern (like OpenAI) - single class implementing both protocols
  • Provide secure API key management via environment variables and .env files
  • Support both Config-based and experiment-based factory modes from the start
  • Handle capability gaps gracefully (no transcription)
  • Use OpenAI SDK with custom base_url if API is OpenAI-compatible (needs verification)
  • Use environment-based model defaults (test vs production)

Grok Model Selection and Cost Analysis

⚠️ Note: Model names, pricing, and API details below are placeholders and need verification from xAI's official documentation.

Configuration Fields

Add to config.py when implementing Grok providers (following OpenAI pattern):

# Grok API Configuration
grok_api_key: Optional[str] = Field(
    default=None,
    alias="grok_api_key",
    description="Grok API key (prefer GROK_API_KEY env var or .env file)"
)

grok_api_base: Optional[str] = Field(
    default=None,
    alias="grok_api_base",
    description="Grok API base URL (default: https://api.x.ai/v1, for E2E testing)"
)

# Grok Model Selection (environment-based defaults, like OpenAI)
grok_speaker_model: str = Field(
    default_factory=_get_default_grok_speaker_model,
    alias="grok_speaker_model",
    description="Grok model for speaker detection (default: environment-based)"
)

grok_summary_model: str = Field(
    default_factory=_get_default_grok_summary_model,
    alias="grok_summary_model",
    description="Grok model for summarization (default: environment-based)"
)

# Shared settings (like OpenAI)
grok_temperature: float = Field(
    default=0.3,
    alias="grok_temperature",
    description="Temperature for Grok generation (0.0-2.0, lower = more deterministic)"
)

grok_max_tokens: Optional[int] = Field(
    default=None,
    alias="grok_max_tokens",
    description="Max tokens for Grok generation (None = model default)"
)

# Grok Prompt Configuration (following OpenAI pattern)
grok_speaker_system_prompt: Optional[str] = Field(
    default=None,
    alias="grok_speaker_system_prompt",
    description="Grok system prompt for speaker detection (default: grok/ner/system_ner_v1)"
)

grok_speaker_user_prompt: str = Field(
    default="grok/ner/guest_host_v1",
    alias="grok_speaker_user_prompt",
    description="Grok user prompt for speaker detection"
)

grok_summary_system_prompt: Optional[str] = Field(
    default=None,
    alias="grok_summary_system_prompt",
    description="Grok system prompt for summarization (default: grok/summarization/system_v1)"
)

grok_summary_user_prompt: str = Field(
    default="grok/summarization/long_v1",
    alias="grok_summary_user_prompt",
    description="Grok user prompt for summarization"
)

Environment-based defaults:

  • Test environment: grok-beta (beta model, typically available for development)
  • Production environment: grok-2 (production model, best quality)

Note: Verify actual model names with your xAI API access. Common patterns suggest grok-beta and grok-2, but model availability may vary.

Model Options and Pricing

⚠️ Note: Pricing information should be verified from xAI's official documentation at https://console.x.ai or https://docs.x.ai. The following are estimates based on common pricing patterns.

Model Input Cost Output Cost Context Window Speed Best For
grok-2 Verify pricing Verify pricing 128k (verify) Medium Production
grok-beta Verify pricing Verify pricing 128k (verify) Medium Dev/Test

Source: Verify current pricing at https://console.x.ai or https://docs.x.ai. Pricing may vary based on your account tier.

Free Tier Limits

⚠️ Needs Verification: Free tier availability and limits should be verified from xAI documentation at https://console.x.ai.

Model Requests/Min Tokens/Min Requests/Day
Verify with API Verify with API Verify with API Verify with API

Note: Check your xAI account dashboard for current rate limits and free tier availability.

Dev/Test vs Production Model Selection

Note: Verify model names with your xAI API access. Common patterns suggest these names, but availability may vary.

Environment Speaker Model Summary Model Rationale
Dev/Test grok-beta grok-beta Beta model for development/testing
Production grok-2 grok-2 Production model, best quality

Cost Comparison: All Providers (Per 100 Episodes)

⚠️ Note: Grok pricing should be verified from xAI documentation. Estimates based on common pricing patterns.

Component OpenAI (gpt-4o-mini) DeepSeek (chat) Grok (verify)
Transcription $0.60 ❌ N/A ❌ N/A
Speaker Detection $0.14 $0.004 Verify pricing
Summarization $0.41 $0.012 Verify pricing
Total Text $0.55 $0.016 Verify pricing

Processing Time Comparison (Single Episode Summary)

⚠️ Note: Grok performance metrics should be verified through actual API testing.

Provider Time Tokens/Second
OpenAI GPT-4o-mini ~5 seconds 100
Anthropic Claude ~5 seconds 100
DeepSeek ~3 seconds 150
Grok Verify Verify

Non-Goals

  • Transcription support (Grok/xAI focuses on text-based LLMs, no audio models)
  • Changing default behavior
  • Self-hosted Grok (not available)

Personas

  • Real-Time Rita: Needs access to real-time information via Grok's X/Twitter integration
  • xAI Enthusiast: Prefers xAI's Grok model over other providers
  • Batch Processing Brenda: Needs to process many episodes efficiently

User Stories

  • As Real-Time Rita, I can use Grok to leverage real-time information in summaries.
  • As xAI Enthusiast, I can use Grok as my preferred AI provider.
  • As Batch Processing Brenda, I can process episodes using Grok's API.

Functional Requirements

FR1: Provider Selection

  • FR1.1: Add "grok" as valid value for speaker_detector_provider
  • FR1.2: Add "grok" as valid value for summary_provider
  • FR1.3: Attempting transcription with Grok results in clear error
  • FR1.4: Default values maintain current behavior
  • FR1.5: Support both Config-based and experiment-based factory modes from the start

FR2: API Key Management

  • FR2.1: Support GROK_API_KEY environment variable (like OPENAI_API_KEY)
  • FR2.2: Support .env file via python-dotenv for convenient configuration
  • FR2.3: Clear error on missing API key
  • FR2.4: Support GROK_API_BASE environment variable for E2E testing (like OPENAI_API_BASE)

FR3: Speaker Detection with Grok

  • FR3.1: Grok provider uses xAI's API for entity extraction
  • FR3.2: Maintains same interface as other providers
  • FR3.3: Uses Grok-specific prompt templates

FR4: Summarization with Grok

  • FR4.1: Grok provider uses xAI's API for summarization
  • FR4.2: Leverages Grok's context window (size TBD)
  • FR4.3: Maintains same interface

FR5: API Compatibility

  • FR5.1: Use OpenAI SDK with custom base_url if Grok API is OpenAI-compatible (needs verification)
  • FR5.2: Alternative: Use xAI SDK if available (needs research)
  • FR5.3: Minimize new dependencies

Technical Requirements

TR1: Architecture

  • TR1.1: Follow unified provider pattern (like OpenAI) - single class implementing both protocols
  • TR1.2: Create providers/grok/grok_provider.py with unified GrokProvider class
  • TR1.3: GrokProvider implements SpeakerDetector and SummarizationProvider protocols
  • TR1.4: Update factories to include Grok option with support for both Config-based and experiment-based modes
  • TR1.5: Create prompts/grok/ directory with provider-specific prompt templates
  • TR1.6: Use OpenAI SDK with custom base_url if API is OpenAI-compatible, or xAI SDK if available
  • TR1.7: Follow OpenAI provider architecture exactly for consistency

TR2: Dependencies

  • TR2.1: Prefer reusing existing openai package if Grok API is OpenAI-compatible
  • TR2.2: Alternative: Use xAI SDK if available (needs research)
  • TR2.3: Minimize new dependencies

Success Criteria

  • ✅ Users can select Grok provider for speaker detection and summarization via unified provider
  • ✅ Clear error when attempting transcription with Grok
  • ✅ API integration works (OpenAI-compatible API at https://api.x.ai/v1)
  • ✅ Real-time information access via X/Twitter integration
  • ✅ Environment-based model defaults (test vs production)
  • ✅ Both Config-based and experiment-based factory modes supported
  • ✅ No new SDK dependency (uses OpenAI SDK)
  • ✅ E2E tests pass
  • ✅ Follows OpenAI provider pattern exactly for consistency

Provider Capability Matrix (Updated)

Capability Local OpenAI Anthropic Mistral DeepSeek Gemini Grok
Transcription
Speaker Detection
Summarization
Real-time Info ✅ (via X/Twitter)

Future Considerations

  • Audio transcription support (if xAI adds audio models)
  • Tool use / function calling
  • Streaming responses for real-time display
  • Enhanced real-time information integration