FLEX Stack

The FLEX Stack is the foundational architecture that powers TextLayer Core, providing a robust framework for building AI-powered applications. This integrated stack combines proven technologies to deliver a complete solution for developing, deploying, and monitoring LLM-powered services.

What is the FLEX Stack?

The FLEX Stack is an acronym that represents the core components of the TextLayer Core architecture:

F: Flask - A lightweight Python web framework used for building the API endpoints
L: LiteLLM/Langfuse - Tools for LLM integration and observability
E: Elasticsearch - Powerful search engine for storing conversation history
X: eXternal services - Integration with various external tools and services

Together, these components create a cohesive architecture that simplifies the development of AI-powered applications while providing the observability and flexibility needed for production deployments.

Components

Flask: Web Framework for API Endpoints

Flask serves as the foundation of the TextLayer Core architecture, providing a lightweight yet powerful web framework for building the API endpoints. Key aspects include:

Modular Structure: Well-organized application structure following Python best practices
RESTful API Design: Clean API design patterns for consistent interfaces
Command Pattern: Separation of business logic from request handling for cleaner code
Middleware Support: Extensible middleware for authentication, logging, and more
Scalability: Designed to scale from simple prototypes to production applications

LiteLLM & Langfuse: LLM Integration and Observability

LiteLLM

LiteLLM provides a unified interface for calling 100+ LLM APIs using the OpenAI format, offering:

Provider Agnostic: Access to OpenAI, Anthropic, VertexAI, Cohere, and many more providers through a single API
Consistent Format: Standardized input/output format across all LLM providers
Fallback Logic: Built-in retry and fallback mechanisms across multiple deployments
Cost Management: Tools for tracking spend and setting budgets per project
Proxy Server: Optional proxy server for centralized access and management

Langfuse

Langfuse is an open-source LLM engineering platform for monitoring and improving LLM applications with features like:

Tracing: Detailed production traces to debug LLM applications faster
Evaluation: Tools for collecting user feedback and running evaluation functions
Prompt Management: Version and deploy prompts collaboratively
Metrics Tracking: Monitor cost, latency, and quality of LLM interactions
Dataset Creation: Derive datasets from production data for testing and fine-tuning

Elasticsearch: Conversation Storage and Search

Elasticsearch powers the vector search capabilities in TextLayer Core, enabling:

Conversation History: Efficient storage and retrieval of conversation history
Vector Search: Fast similarity searches for semantic matching
Scalability: Ability to handle large volumes of data with horizontal scaling
Real-time Analysis: Immediate indexing and search capabilities
Robust Query Language: Powerful query capabilities for complex data retrieval

Click here to setup opensearch locally

X (External Services): Extensible Tooling

The “X” in FLEX represents the extensible nature of TextLayer Core, allowing integration with various external services:

Custom Tool Creation: Framework for building tools that leverage external services
API Integrations: Pre-built connectors for common third-party services
Plugin Architecture: Extensible design for adding new capabilities
Tool Registry: Central management of available tools and their configurations
Authentication Handling: Secure management of service credentials

For more information on building custom tools, see the How to Build a Tool guide.

Benefits

The FLEX Stack provides several key advantages for AI application development:

Accelerated Development

Standardized Architecture: Consistent patterns across all components
Reduced Boilerplate: Focus on business logic rather than infrastructure
Best Practices: Built on proven architectural patterns

Production Readiness

Scalable Design: Handles growing user bases and data volumes
Observability: Comprehensive monitoring and debugging tools
Reliability: Fallback mechanisms and error handling

Flexibility and Extensibility

Provider Agnostic: Not locked into specific LLM providers
Customizable: Extend with additional tools and services
Framework Integration: Works with existing development frameworks

Cost Optimization

Usage Tracking: Monitor LLM costs across projects
Budget Controls: Set spending limits to prevent surprises
Efficiency Tools: Optimize prompts and model selection for cost-effectiveness

Overview

Get Started

Core Concepts

Guides

Security and Compliance

Troubleshooting

What is the FLEX Stack?

Components

Flask: Web Framework for API Endpoints

LiteLLM & Langfuse: LLM Integration and Observability

LiteLLM

Langfuse

Elasticsearch: Conversation Storage and Search

X (External Services): Extensible Tooling

Benefits

Accelerated Development

Production Readiness

Flexibility and Extensibility

Cost Optimization

Overview

Get Started

Core Concepts

Guides

Security and Compliance

Troubleshooting

​What is the FLEX Stack?

​Components

​Flask: Web Framework for API Endpoints

​LiteLLM & Langfuse: LLM Integration and Observability

​LiteLLM

​Langfuse

​Elasticsearch: Conversation Storage and Search

​X (External Services): Extensible Tooling

​Benefits

​Accelerated Development

​Production Readiness

​Flexibility and Extensibility

​Cost Optimization

What is the FLEX Stack?

Components

Flask: Web Framework for API Endpoints

LiteLLM & Langfuse: LLM Integration and Observability

LiteLLM

Langfuse

Elasticsearch: Conversation Storage and Search

X (External Services): Extensible Tooling

Benefits

Accelerated Development

Production Readiness

Flexibility and Extensibility

Cost Optimization