Current research

Gimlet Labs

Today's computing systems will undergo a massive transformation to efficiently and scalably serve AI workloads. Gimlet is an applied research lab dedicated to envisioning the next generation of these systems.

Current research

Autonomous kernel generation for heterogeneous hardware

Kernel efficiency drives inference and training performance. Techniques such as kernel fusion can dramatically speed up models, yet writing optimized kernels remains complex and time-consuming (especially for non-CUDA devices). At Gimlet, we're exploring AI agent architectures that automatically generate tuned kernels for diverse hardware. This enables rapid autoporting of AI workloads to new devices and boosts performance in current systems, without code changes.

SLA-aware dynamic datacenter scheduling of AI agent workloads

AI datacenters must meet tight performance and cost targets while handling multi-stage agents with varied bottlenecks at each stage - compute, memory, network, etc. We are investigating how to partition and schedule these agents across distributed hardware so end-to-end SLAs are consistently met.

Hybrid edge/cloud workload partitioning and orchestration

AI applications should be both cost efficient and performant for end users. Moving selected workload slices onto a user's device can provide privacy, responsiveness, and TCO benefits. Our research investigates the most effective ways to partition workloads across hybrid edge/cloud systems to improve both user experience and provider costs.

Universal AI compiler for heterogeneous hardware

Ideally, AI workloads should be easily runnable on a variety of target systems. Today, running AI workloads on new systems demands significant manual porting. We're building a MLIR-based universal AI compiler that represents and optimizes compute graphs. The compiler can perform both general optimizations as well as device-aware optimizations, making use of the specific software/hardware features available on the target system.

Headless hardware architectures for serving AI inference

We're rethinking hardware systems for serving AI workloads, focusing on cost-effective designs with off-the-shelf components. To that goal, we eare exploring designs that replace traditional motherboards with DPUs, pairing them with accelerators to create lean, headless systems. These headless systems can function within an AI datacenter or as a standalone AI workstation, delivering cost-effective system performance.

Cost-aware optimization frameworks for AI workloads

Datacenter operators need fast, accurate cost models to allocate diverse AI tasks at scale. We're developing predictive frameworks that capture both workload characteristics and hardware economics in multitenant environments. Representing workloads as task graphs (with associated performance and cost weights) supports a convex-optimization-based approach which produces globally optimal plans.