Skip to main contentThe FLEX Stack is the foundational architecture that powers TextLayer Core, providing a robust framework for building AI-powered applications. This integrated stack combines proven technologies to deliver a complete solution for developing, deploying, and monitoring LLM-powered services.
What is the FLEX Stack?
The FLEX Stack is an acronym that represents the core components of the TextLayer Core architecture:
- F: Flask - A lightweight Python web framework used for building the API endpoints
- L: LiteLLM/Langfuse - Tools for LLM integration and observability
- E: Elasticsearch - Powerful search engine for storing conversation history
- X: eXternal services - Integration with various external tools and services
Together, these components create a cohesive architecture that simplifies the development of AI-powered applications while providing the observability and flexibility needed for production deployments.
Components
Flask: Web Framework for API Endpoints
Flask serves as the foundation of the TextLayer Core architecture, providing a lightweight yet powerful web framework for building the API endpoints. Key aspects include:
- Modular Structure: Well-organized application structure following Python best practices
- RESTful API Design: Clean API design patterns for consistent interfaces
- Command Pattern: Separation of business logic from request handling for cleaner code
- Middleware Support: Extensible middleware for authentication, logging, and more
- Scalability: Designed to scale from simple prototypes to production applications
LiteLLM & Langfuse: LLM Integration and Observability
LiteLLM
LiteLLM provides a unified interface for calling 100+ LLM APIs using the OpenAI format, offering:
- Provider Agnostic: Access to OpenAI, Anthropic, VertexAI, Cohere, and many more providers through a single API
- Consistent Format: Standardized input/output format across all LLM providers
- Fallback Logic: Built-in retry and fallback mechanisms across multiple deployments
- Cost Management: Tools for tracking spend and setting budgets per project
- Proxy Server: Optional proxy server for centralized access and management
Langfuse
Langfuse is an open-source LLM engineering platform for monitoring and improving LLM applications with features like:
- Tracing: Detailed production traces to debug LLM applications faster
- Evaluation: Tools for collecting user feedback and running evaluation functions
- Prompt Management: Version and deploy prompts collaboratively
- Metrics Tracking: Monitor cost, latency, and quality of LLM interactions
- Dataset Creation: Derive datasets from production data for testing and fine-tuning
Elasticsearch: Conversation Storage and Search
Elasticsearch powers the vector search capabilities in TextLayer Core, enabling:
- Conversation History: Efficient storage and retrieval of conversation history
- Vector Search: Fast similarity searches for semantic matching
- Scalability: Ability to handle large volumes of data with horizontal scaling
- Real-time Analysis: Immediate indexing and search capabilities
- Robust Query Language: Powerful query capabilities for complex data retrieval
Click here to setup opensearch locally
The “X” in FLEX represents the extensible nature of TextLayer Core, allowing integration with various external services:
- Custom Tool Creation: Framework for building tools that leverage external services
- API Integrations: Pre-built connectors for common third-party services
- Plugin Architecture: Extensible design for adding new capabilities
- Tool Registry: Central management of available tools and their configurations
- Authentication Handling: Secure management of service credentials
For more information on building custom tools, see the How to Build a Tool guide.
Benefits
The FLEX Stack provides several key advantages for AI application development:
Accelerated Development
- Standardized Architecture: Consistent patterns across all components
- Reduced Boilerplate: Focus on business logic rather than infrastructure
- Best Practices: Built on proven architectural patterns
Production Readiness
- Scalable Design: Handles growing user bases and data volumes
- Observability: Comprehensive monitoring and debugging tools
- Reliability: Fallback mechanisms and error handling
Flexibility and Extensibility
- Provider Agnostic: Not locked into specific LLM providers
- Customizable: Extend with additional tools and services
- Framework Integration: Works with existing development frameworks
Cost Optimization
- Usage Tracking: Monitor LLM costs across projects
- Budget Controls: Set spending limits to prevent surprises
- Efficiency Tools: Optimize prompts and model selection for cost-effectiveness