Skip to content

PRD-013: Grok Provider Integration (xAI)

  • Status: Implemented (v2.5.0)
  • Revision: 3
  • Date: 2026-02-05
  • Implementation: Issue #1095
  • Related RFCs:
  • RFC-036 — Grok provider (complete) (Updated)
  • Related PRDs: PRD-006 (OpenAI), PRD-011 (DeepSeek)

Summary

Add Grok (by xAI) as an optional provider for speaker detection and summarization capabilities. Grok is xAI's AI model, available through their API. Like OpenAI, Grok uses a unified provider pattern where a single GrokProvider class implements both capabilities. Like Anthropic and DeepSeek, Grok does NOT support audio transcription (xAI focuses on text-based LLMs).

Note: API details have been researched based on xAI's public API documentation and common OpenAI-compatible API patterns.

Background & Context

Grok (xAI) offers several advantages:

  • xAI's AI Model: Grok is xAI's proprietary AI model (Elon Musk's AI company)
  • Real-time Information: Grok has access to real-time information via X/Twitter integration
  • OpenAI-Compatible API: Uses OpenAI-compatible API format (can reuse OpenAI SDK with custom base_url)
  • Competitive Pricing: Competitive pricing compared to other providers
  • Public API: Public API available at https://api.x.ai/v1 (verify with your API key)

API Details (Verified/Assumed):

  • Base URL: https://api.x.ai/v1 (OpenAI-compatible endpoint)
  • SDK: Uses OpenAI SDK with custom base_url (no new dependency)
  • Authentication: API key via GROK_API_KEY environment variable
  • Model names: grok-beta (beta/development), grok-2 (production) - verify with your API access
  • Context window: Likely 128k tokens (verify with API documentation)

This PRD addresses adding Grok as an xAI-based provider.

Goals

  • Add Grok as provider option for speaker detection and summarization
  • Maintain 100% backward compatibility
  • Follow unified provider pattern (like OpenAI) - single class implementing both protocols
  • Provide secure API key management via environment variables and .env files
  • Support both Config-based and experiment-based factory modes from the start
  • Handle capability gaps gracefully (no transcription)
  • Use OpenAI SDK with custom base_url if API is OpenAI-compatible (needs verification)
  • Use environment-based model defaults (test vs production)

Grok Model Selection and Cost Analysis

** Note:** Model names, pricing, and API details below are placeholders and need verification from xAI's official documentation.

Configuration Fields

Add to config.py when implementing Grok providers (following OpenAI pattern):

# Grok API Configuration
grok_api_key: Optional[str] = Field(
    default=None,
    alias="grok_api_key",
    description="Grok API key (prefer GROK_API_KEY env var or .env file)"
)

grok_api_base: Optional[str] = Field(
    default=None,
    alias="grok_api_base",
    description="Grok API base URL (default: https://api.x.ai/v1, for E2E testing)"
)

# Grok Model Selection (environment-based defaults, like OpenAI)
grok_speaker_model: str = Field(
    default_factory=_get_default_grok_speaker_model,
    alias="grok_speaker_model",
    description="Grok model for speaker detection (default: environment-based)"
)

grok_summary_model: str = Field(
    default_factory=_get_default_grok_summary_model,
    alias="grok_summary_model",
    description="Grok model for summarization (default: environment-based)"
)

# Shared settings (like OpenAI)
grok_temperature: float = Field(
    default=0.3,
    alias="grok_temperature",
    description="Temperature for Grok generation (0.0-2.0, lower = more deterministic)"
)

grok_max_tokens: Optional[int] = Field(
    default=None,
    alias="grok_max_tokens",
    description="Max tokens for Grok generation (None = model default)"
)

# Grok Prompt Configuration (following OpenAI pattern)
grok_speaker_system_prompt: Optional[str] = Field(
    default=None,
    alias="grok_speaker_system_prompt",
    description="Grok system prompt for speaker detection (default: grok/ner/system_ner_v1)"
)

grok_speaker_user_prompt: str = Field(
    default="grok/ner/guest_host_v1",
    alias="grok_speaker_user_prompt",
    description="Grok user prompt for speaker detection"
)

grok_summary_system_prompt: Optional[str] = Field(
    default=None,
    alias="grok_summary_system_prompt",
    description="Grok system prompt for summarization (default: grok/summarization/system_v1)"
)

grok_summary_user_prompt: str = Field(
    default="grok/summarization/long_v1",
    alias="grok_summary_user_prompt",
    description="Grok user prompt for summarization"
)

Environment-based defaults:

  • Test environment: grok-beta (beta model, typically available for development)
  • Production environment: grok-2 (production model, best quality)

Note: Verify actual model names with your xAI API access. Common patterns suggest grok-beta and grok-2, but model availability may vary.

Model Options and Pricing

** Note:** Pricing information should be verified from xAI's official documentation at https://console.x.ai or https://docs.x.ai. The following are estimates based on common pricing patterns.

Model Input Cost Output Cost Context Window Speed Best For
grok-2 Verify pricing Verify pricing 128k (verify) Medium Production
grok-beta Verify pricing Verify pricing 128k (verify) Medium Dev/Test

Source: Verify current pricing at https://console.x.ai or https://docs.x.ai. Pricing may vary based on your account tier.

Free Tier Limits

** Needs Verification:** Free tier availability and limits should be verified from xAI documentation at https://console.x.ai.

Model Requests/Min Tokens/Min Requests/Day
Verify with API Verify with API Verify with API Verify with API

Note: Check your xAI account dashboard for current rate limits and free tier availability.

Dev/Test vs Production Model Selection

Note: Verify model names with your xAI API access. Common patterns suggest these names, but availability may vary.

Environment Speaker Model Summary Model Rationale
Dev/Test grok-beta grok-beta Beta model for development/testing
Production grok-2 grok-2 Production model, best quality

Cost Comparison: All Providers (Per 100 Episodes)

** Note:** Grok pricing should be verified from xAI documentation. Estimates based on common pricing patterns.

Component OpenAI (gpt-4o-mini) DeepSeek (chat) Grok (verify)
Transcription $0.60 No — N/A No — N/A
Speaker Detection $0.14 $0.004 Verify pricing
Summarization $0.41 $0.012 Verify pricing
Total Text $0.55 $0.016 Verify pricing

Processing Time Comparison (Single Episode Summary)

** Note:** Grok performance metrics should be verified through actual API testing.

Provider Time Tokens/Second
OpenAI GPT-4o-mini ~5 seconds 100
Anthropic Claude ~5 seconds 100
DeepSeek ~3 seconds 150
Grok Verify Verify

Non-Goals

  • Transcription support (Grok/xAI focuses on text-based LLMs, no audio models)
  • Changing default behavior
  • Self-hosted Grok (not available)

Personas

  • Real-Time Rita: Needs access to real-time information via Grok's X/Twitter integration
  • xAI Enthusiast: Prefers xAI's Grok model over other providers
  • Batch Processing Brenda: Needs to process many episodes efficiently

User Stories

  • As Real-Time Rita, I can use Grok to leverage real-time information in summaries.
  • As xAI Enthusiast, I can use Grok as my preferred AI provider.
  • As Batch Processing Brenda, I can process episodes using Grok's API.

Functional Requirements

FR1: Provider Selection

  • FR1.1: Add "grok" as valid value for speaker_detector_provider
  • FR1.2: Add "grok" as valid value for summary_provider
  • FR1.3: Attempting transcription with Grok results in clear error
  • FR1.4: Default values maintain current behavior
  • FR1.5: Support both Config-based and experiment-based factory modes from the start

FR2: API Key Management

  • FR2.1: Support GROK_API_KEY environment variable (like OPENAI_API_KEY)
  • FR2.2: Support .env file via python-dotenv for convenient configuration
  • FR2.3: Clear error on missing API key
  • FR2.4: Support GROK_API_BASE environment variable for E2E testing (like OPENAI_API_BASE)

FR3: Speaker Detection with Grok

  • FR3.1: Grok provider uses xAI's API for entity extraction
  • FR3.2: Maintains same interface as other providers
  • FR3.3: Uses Grok-specific prompt templates

FR4: Summarization with Grok

  • FR4.1: Grok provider uses xAI's API for summarization
  • FR4.2: Leverages Grok's context window (size TBD)
  • FR4.3: Maintains same interface

FR5: API Compatibility

  • FR5.1: Use OpenAI SDK with custom base_url if Grok API is OpenAI-compatible (needs verification)
  • FR5.2: Alternative: Use xAI SDK if available (needs research)
  • FR5.3: Minimize new dependencies

Technical Requirements

TR1: Architecture

  • TR1.1: Follow unified provider pattern (like OpenAI) - single class implementing both protocols
  • TR1.2: Create providers/grok/grok_provider.py with unified GrokProvider class
  • TR1.3: GrokProvider implements SpeakerDetector and SummarizationProvider protocols
  • TR1.4: Update factories to include Grok option with support for both Config-based and experiment-based modes
  • TR1.5: Create prompts/grok/ directory with provider-specific prompt templates
  • TR1.6: Use OpenAI SDK with custom base_url if API is OpenAI-compatible, or xAI SDK if available
  • TR1.7: Follow OpenAI provider architecture exactly for consistency

TR2: Dependencies

  • TR2.1: Prefer reusing existing openai package if Grok API is OpenAI-compatible
  • TR2.2: Alternative: Use xAI SDK if available (needs research)
  • TR2.3: Minimize new dependencies

Success Criteria

  • Users can select Grok provider for speaker detection and summarization via unified provider
  • Clear error when attempting transcription with Grok
  • API integration works (OpenAI-compatible API at https://api.x.ai/v1)
  • Real-time information access via X/Twitter integration
  • Environment-based model defaults (test vs production)
  • Both Config-based and experiment-based factory modes supported
  • No new SDK dependency (uses OpenAI SDK)
  • E2E tests pass
  • Follows OpenAI provider pattern exactly for consistency

Provider Capability Matrix (Updated)

Capability Local OpenAI Anthropic Mistral DeepSeek Gemini Grok
Transcription Yes Yes No Yes No Yes No
Speaker Detection Yes Yes Yes Yes Yes Yes Yes
Summarization Yes Yes Yes Yes Yes Yes Yes
Real-time Info No No No No No No Yes — (via X/Twitter)

Future Considerations

  • Audio transcription support (if xAI adds audio models)
  • Tool use / function calling
  • Streaming responses for real-time display
  • Enhanced real-time information integration