Skip to content

feat: add LiteLLM as AI gateway provider#181

Open
RheagalFire wants to merge 1 commit into
szczyglis-dev:masterfrom
RheagalFire:feat/add-litellm-provider
Open

feat: add LiteLLM as AI gateway provider#181
RheagalFire wants to merge 1 commit into
szczyglis-dev:masterfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

@RheagalFire RheagalFire commented Apr 24, 2026

Summary

  • Adds LiteLLM as a new LLM provider, giving users access to 100+ LLM providers through litellm.completion()
  • Follows the same BaseLLM + LlamaIndex CustomLLM pattern as existing providers (OpenRouter, DeepSeek, etc.)

Prior art

No prior LiteLLM PRs, issues, or discussions in this repo.

Changes

  • src/pygpt_net/provider/llms/litellm.py - new LiteLLMProvider(BaseLLM) + LiteLLMIndex(CustomLLM) that calls litellm.completion() directly with drop_params=True for cross-provider compatibility
  • src/pygpt_net/app.py - registered LiteLLMProvider via launcher.add_llm()
  • pyproject.toml - added litellm>=1.55,<2.0

Usage

  from pygpt_net.provider.llms.litellm import LiteLLMIndex

  # Set provider API key via env var before running
  # export ANTHROPIC_API_KEY=sk-ant-...
                                                                                                                                                                                                                     
  # Create the LLM instance
  llm = LiteLLMIndex(                                                                                                                                                                                                
      model_name="anthropic/claude-sonnet-4-20250514",
      temperature=0.7,
      max_tokens=1024,
  )                                                                                                                                                                                                                  
   
  # Non-streaming completion                                                                                                                                                                                         
  response = llm.complete("What is the capital of France?")
  print(response.text)
  # Output: The capital of France is Paris.

  # Streaming completion
  for chunk in llm.stream_complete("Tell me 3 facts about Paris."):
      print(chunk.delta, end="", flush=True)                                                                                                                                                                         
   
  # Chat with message history                                                                                                                                                                                        
  from llama_index.core.llms import ChatMessage, MessageRole

  messages = [
      ChatMessage(role=MessageRole.SYSTEM, content="You are a helpful travel guide."),
      ChatMessage(role=MessageRole.USER, content="What should I visit in Paris?"),                                                                                                                                   
  ]
  chat_response = llm.chat(messages)                                                                                                                                                                                 
  print(chat_response.message.content)
                                                                                                                                                                                                                     
  # Streaming chat
  for chunk in llm.stream_chat(messages):                                                                                                                                                                            
      if chunk.delta:
          print(chunk.delta, end="", flush=True)

Testing

14 live E2E tests against Azure AI Foundry via litellm.completion(), covering every code path:

 PASS: complete basic              - non-streaming completion returns correct answer + raw response
 PASS: stream_complete             - streaming chunks arrive with delta content
 PASS: chat basic                  - chat with single user message                                                                                                                                                  
 PASS: chat with system            - system role message forwarded correctly
 PASS: multi-turn chat             - conversation context retained across turns                                                                                                                                     
 PASS: stream_chat                 - streaming chat produces output
 PASS: metadata                    - model name and max output reported correctly
 PASS: max_tokens respected        - short max_tokens produces short response                                                                                                                                       
 PASS: temperature override        - temperature=0 works without error
 PASS: long prompt                 - large input handled without truncation                                                                                                                                         
 PASS: nonexistent model raises    - invalid model raises error
 PASS: empty prompt raises         - empty input raises error                                                                                                                                                       
 PASS: raw response keys           - response contains choices + model keys
 PASS: stream accumulates          - text accumulates correctly across chunks                                                                                                                                       

Results: 14 passed, 0 failed

Risk / Compatibility

  • Additive only. Existing providers untouched.
  • litellm is a new dependency. LiteLLMIndex extends CustomLLM from llama-index-core (already a project dependency).
  • drop_params=True silently drops provider-unsupported kwargs, preventing cross-provider errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant