How to run enterprise GenAI like a production service

Enterprise GenAI (generative AI) deployments succeed when teams run them with the same discipline they apply to other user-facing services. The model sits in the middle of a pipeline that handles identity, policy, retrieval, inference, and logging. Each stage affects quality, latency, cost, and risk. A pilot can hide these dependencies. Production traffic exposes them….

Read More

Why most AI agents disappoint in production (and what to fix first)

AI agents look brilliant in a demo because demos are friendly worlds. The data is curated, the tools behave, and nothing important changes while the agent is in mid-thought. Production is the opposite: data arrives late, facts conflict, permissions bite, APIs time out, and the underlying state changes constantly. That gap is why early “agents…

Read More

How to add AI to an existing product (without annoying users)

While generative AI has shown promising results in advancing software engineering, its inclusion within end-user applications is a different story. Features labeled as AI continue to pop up across every UI, but they’re not always helpful or useful. Often driven by hype, they can become a distraction, or worse, a productivity killer. “Many fall into…

Read More

Small language models: Rethinking enterprise AI architecture

Three key advantages of SLMs Division of labor: Modern AI architecture uses routers to send routine tasks to 7B-parameter SLMs, reserving trillion-parameter LLMs only for complex reasoning. Economic efficiency: For high-volume, repetitive tasks, SLMs can reduce cloud inference costs by up to 90% while providing near-instant latency. Privacy at the edge: Because SLMs can run…

Read More

Google’s Gemma 4 shines on local systems – both big and small

Google’s Gemma 4 comes touted as the latest evolution of Google’s multi-modal model offerings. Gemma 4 not only offers reasoning and tool use, but vision and audio functionality, and it’s available in a range of model sizes that target servers and local devices. What’s striking about Gemma 4 is that even at the higher end…

Read More

27 questions to ask when choosing an LLM

Car buyers kick tires. Horse traders inspect the teeth. What should shoppers for large language models (LLMs) do? Here are 27 prescient questions that developers are asking before they adopt a particular model. Model capabilities are diverse, and not every application requires the same support. These questions will help you identify the best models for…

Read More