The FLEX Stack is the foundational architecture that powers TextLayer Core, providing a robust framework for building AI-powered applications. This integrated stack combines proven technologies to deliver a complete solution for developing, deploying, and monitoring LLM-powered services.

What is the FLEX Stack?

The FLEX Stack is an acronym that represents the core components of the TextLayer Core architecture:
  • F: Flask - A lightweight Python web framework used for building the API endpoints
  • L: LiteLLM/Langfuse - Tools for LLM integration and observability
  • E: Elasticsearch - Powerful search engine for storing conversation history
  • X: eXternal services - Integration with various external tools and services
Together, these components create a cohesive architecture that simplifies the development of AI-powered applications while providing the observability and flexibility needed for production deployments.

Components

Flask: Web Framework for API Endpoints

Flask serves as the foundation of the TextLayer Core architecture, providing a lightweight yet powerful web framework for building the API endpoints. Key aspects include:
  • Modular Structure: Well-organized application structure following Python best practices
  • RESTful API Design: Clean API design patterns for consistent interfaces
  • Command Pattern: Separation of business logic from request handling for cleaner code
  • Middleware Support: Extensible middleware for authentication, logging, and more
  • Scalability: Designed to scale from simple prototypes to production applications

LiteLLM & Langfuse: LLM Integration and Observability

LiteLLM

LiteLLM provides a unified interface for calling 100+ LLM APIs using the OpenAI format, offering:
  • Provider Agnostic: Access to OpenAI, Anthropic, VertexAI, Cohere, and many more providers through a single API
  • Consistent Format: Standardized input/output format across all LLM providers
  • Fallback Logic: Built-in retry and fallback mechanisms across multiple deployments
  • Cost Management: Tools for tracking spend and setting budgets per project
  • Proxy Server: Optional proxy server for centralized access and management

Langfuse

Langfuse is an open-source LLM engineering platform for monitoring and improving LLM applications with features like:
  • Tracing: Detailed production traces to debug LLM applications faster
  • Evaluation: Tools for collecting user feedback and running evaluation functions
  • Prompt Management: Version and deploy prompts collaboratively
  • Metrics Tracking: Monitor cost, latency, and quality of LLM interactions
  • Dataset Creation: Derive datasets from production data for testing and fine-tuning
Elasticsearch powers the vector search capabilities in TextLayer Core, enabling:
  • Conversation History: Efficient storage and retrieval of conversation history
  • Vector Search: Fast similarity searches for semantic matching
  • Scalability: Ability to handle large volumes of data with horizontal scaling
  • Real-time Analysis: Immediate indexing and search capabilities
  • Robust Query Language: Powerful query capabilities for complex data retrieval
Click here to setup opensearch locally

X (External Services): Extensible Tooling

The “X” in FLEX represents the extensible nature of TextLayer Core, allowing integration with various external services:
  • Custom Tool Creation: Framework for building tools that leverage external services
  • API Integrations: Pre-built connectors for common third-party services
  • Plugin Architecture: Extensible design for adding new capabilities
  • Tool Registry: Central management of available tools and their configurations
  • Authentication Handling: Secure management of service credentials
For more information on building custom tools, see the How to Build a Tool guide.

Benefits

The FLEX Stack provides several key advantages for AI application development:

Accelerated Development

  • Standardized Architecture: Consistent patterns across all components
  • Reduced Boilerplate: Focus on business logic rather than infrastructure
  • Best Practices: Built on proven architectural patterns

Production Readiness

  • Scalable Design: Handles growing user bases and data volumes
  • Observability: Comprehensive monitoring and debugging tools
  • Reliability: Fallback mechanisms and error handling

Flexibility and Extensibility

  • Provider Agnostic: Not locked into specific LLM providers
  • Customizable: Extend with additional tools and services
  • Framework Integration: Works with existing development frameworks

Cost Optimization

  • Usage Tracking: Monitor LLM costs across projects
  • Budget Controls: Set spending limits to prevent surprises
  • Efficiency Tools: Optimize prompts and model selection for cost-effectiveness