Bullish on AI infrastructure, bearish on AI developer frameworks

Jan 29 2025

When I said “don’t buy the AI library hype”, one of the more common responses was “Did you try $NEW_AI_FRAMEWORK instead?”

After thinking about these comments, I realized that my framing of the developer AI framework space as a maturity issue is not quite correct. I see a structural deficiency in this market, and I believe that many (most?) developer AI frameworks suffer from this gap.

Let me explain.

An AI Application (should be) an event-driven asynchronous application

As context, I believe most AI applications should adopt an event-driven, asynchronous architecture. This is because most AI operations (e.g., calculating an embedding, or calling an LLM) have high latency. In a synchronous architecture, a call to the LLM can occupy a thread for tens of seconds (or longer!) while it waits for a response.

In an asynchronous architecture, the application sends a remote procedure call (RPC) to the LLM and moves on to other tasks while waiting for the response. This non-blocking approach ensures that threads are not tied up, allowing the application to handle other requests or workflows simultaneously. Once the LLM responds, an event listener or callback mechanism processes the response, reintegrating it into the workflow.

The core value of AI developer frameworks: syntactic sugar and standardization

AI developer frameworks such as LangChain, LlamaIndex, Mirascope generally make it easier to create these RPCs, which can get complicated. For example, a typical RPC could include all of the following:

A “system” prompt, which sets the overall context for the AI interaction
A “user” prompt, which contains the actual question at hand
Historical context, which includes any relevant conversational history
Question context, which includes any data that might be relevant to the prompt itself (e.g., a PDF document)

But while assembling all of these details can be complicated and time consuming for a developer, the workflow is fairly straightforward (in most cases, you’re populating a JSON object to send with your RPC). The reality is that these developer frameworks codify common design patterns by providing thin wrappers around a bunch of other lower-level libraries.

Thus, the core value of these libraries is not “developer productivity”. The core value of these libraries is standardization (H/T Erick Friis) through syntactic sugar. For organizations with lots of developers, consistency is key to maintainability, and these libraries provide consistency. Outside of consistency, I don’t believe that these frameworks offer much value.

So what AI frameworks and tools do add value?

While AI developer libraries are not valuable for most developers (standardization is an organizational benefit), I believe there are three categories of libraries & infrastructure that AI developers do use which will have enduring value: event-driven infrastructure, model training, and model inference. The software in these three categories are all non-trivial to implement, and comprise important parts of an AI application.

Event-driven infrastructure

Virtually all AI applications adopt an event-driven architecture, given the high latency in LLM response. Asynchronous I/O libraries, message queues, and other infrastructure software created for event-driven architectures are even more relevant today in the age of AI.

As an entrepreneur, I believe premature scaling is a common mistake that startups make. Prior to building my own small-scale AI application, I would have put much of event-driven infrastructure into the “premature scaling” category. Then, I ran into some of these challenges at very small scale, and that has forced me to reconsider.

I now believe that AI applications are forcing a reordering of technology priorities, and event-driven infrastructure will see a renaissance. The winners of this space will be the organizations that do the best job of creating a great developer experience for AI applications.

Model training

Training pipelines are the backbone of any real-world AI application. Building these pipelines require significant engineering effort to create high quality, reproducible models. Training pipelines typically include steps for data processing, fine tuning, and evaluation. Fortunately, a robust ecosystem of libraries and tools are available to make it easier to build these pipelines. Some examples:

Data preprocessing and manipulation: Libraries like Pandas solve for the messy, real-world challenge of efficiently wrangling and cleaning large datasets. Without it, you’d be reinventing functionality for basic tasks like merging, filtering, or aggregating data.
Fine-tuning models: Libraries like HuggingFace Transformers offer pre-built functionality for adapting large language models to your domain. Implementing this on your own would require deep familiarity with tokenization, optimizer setups, and memory management.
Experiment tracking and reproducibility: Tools like Weights & Biases solve the hard problem of managing hundreds of experiments with varying hyperparameters, dataset splits, and evaluation results. This is critical for teams working collaboratively on model improvements.

Model inference

Serving a model in production is deceptively complex. While running a single inference locally might seem trivial, scaling that process to handle latency requirements, cost constraints, and diverse use cases exposes deep infrastructure needs. Some example tools for model inference include:

Inference engines: Tools like vLLM address the core problem of high-latency inference by optimizing token streaming and GPU utilization. These aren’t just “nice-to-haves”—without them, response times can make your application unusable.
KV caching: Specialized caches for LLMs such as LMcache can dramatically reduce LLM latency and improve scalability.
Structured output generation: Generating reliable structured data isn’t trivial. Libraries like Outlines address this by wrapping generation with schema validation, saving significant development time and reducing downstream errors.

(There’s a shift that is also taking place where allocating 💰 to inference vs training provides a better bang-for-the-buck on accuracy which needs to be explored, but that’s yet another post.)

The evolving AI ecosystem

While I’m optimistic about the AI ecosystem as a whole, I’m pessimistic about the future of AI developer libraries in this ecosystem. The LangChain team is also moving beyond their core library, with the introduction of LangSmith (AI observability) and LangGraph (durable workflow). The tools that will thrive will be the ones that solve hard problems, made accessible through a great developer experience. We’re still in the early innings, with lots more to come!

(Thanks to William Bakst and Erick Friis for feedback.)