Vara Srinivas

Vara Srinivas

Director, Data Engineering & AI Enablement
Publishing: ai.varasrinivas.com (AI-SDLC) · agenticai.varasrinivas.com (Agentic AI)
Home
Build
AI SDLC
Agentic AI

Welcome

I'm Srinivas Vara — Director of Data Engineering & AI Enablement at a Data company, based out of Hyderabad. I also lead AI initiatives and AI literacy across the India Solution Center — driving adoption of AI-assisted engineering at scale.

This site is where I keep what I'm learning and experimenting with. Twenty-plus years of experience across data, B2B eCommerce, healthcare platforms, and network management. Same instincts as ever; new tools.

Courses I'm publishing

Two parallel course series, both anchored on a working UCC Lien Risk Intelligence platform — nothing toy, every module produces a real component.

AI-SDLC course series

22 courses across foundation, build, AI tooling, agentic AI, project management, and governance. The methodology side — how to integrate AI coding assistants into a working SDLC.

AI-SDLC Executive Playbook

11 strategic modules — business case, ROI models, MCP servers, governance frameworks, and a 90-day adoption roadmap. The executive view of AI-SDLC, no engineering background required. Built for PMs and technology leaders.

Agentic AI course

22 modules + 5 capstones. From the LLM mental model through tools, memory, planning, guardrails, observability, and deployment — ending in a full production agent system.

Claude Code Mastery

A focused deep dive on Claude Code — skills, plugins, hooks, slash commands, MCP integration, sub-agents, and headless CI mode. The tooling track for engineers who want to get one coding agent earning its keep at production scale.

What I work on

Strongest right now in data engineering on GCP, AI tooling adoption, and cloud architecture. Long history with Java/Spring, SAP Commerce, and AWS / Azure platforms.

Data Engineering on GCP — Dataproc, BigQuery, Composer, Pub/Sub, GKE

AI Tooling & Adoption — Claude Code, Gemini Code Assist, MCP

Cloud Architecture — GCP / AWS / Azure

Spark / Databricks / Airflow

Java / Spring

Python & NestJS

SAP Commerce / B2B eCommerce

What I Build

In the data engineering organisation I lead, we ingest raw public-records feeds and transform them into canonical and best-view datasets that feed the credit-risk and compliance products, including real-time data delivery APIs.

Medallion architecture on GCP

Raw source feeds land in a bronze layer with full lineage and schema-drift handling. They're cleaned, conformed, and de-duplicated into a silver layer, then aggregated into the gold "best-view" datasets that downstream APIs and scoring models depend on. Telemetry runs alongside the pipeline rather than after it.

Public-records pipeline — live flow

Sources
UCC · liens · bk
Bronze
raw + lineage
Silver
curated
Gold
best-view
APIs
credit risk

Pipeline orchestration

Airflow on Cloud Composer drives the DAGs; Pub/Sub carries the events between stages; Dataproc and BigQuery do the work. GKE hosts the slim service surfaces that need to react in real time.

Composer Pub/Sub Dataproc BigQuery GKE

Schema drift & CDC

Government and shipping sources change schemas without warning. We detect drift at the bronze layer, capture changes through the curve into silver, and keep contracts stable for downstream gold consumers.

CDC Drift detection Contracts

Ingestion observability

Quality checks run as part of ingestion, not after it. Health metrics, source-level SLAs, and auditable data so downstream teams trust what they're consuming — and so we catch regressions before they reach the API.

Quality SLOs Lineage Alerting

AI-assisted SDLC & Agentic Tooling

AI-SDLC is the practice of integrating AI coding assistants, spec-driven planning, and automated quality gates into every phase of software development. It's a methodology, not a tool — you don't buy it, you adopt it. The difference between “giving developers ChatGPT” and AI-SDLC is governance: spec contracts, proposal reviews, and CI gates that make AI-generated code correct, secure, and consistent.

Learn it in depth

Two ways in — the technical course series for engineers and architects, and the executive playbook for PMs and leadership. Same domain, same methodology, different audience.

AI-SDLC course series — for engineers

22 courses across foundation, build, AI tooling, agentic AI, project management, and governance — the full methodology, working examples included. Anchored on a real UCC Lien Risk Intelligence platform; every course produces an actual component.

Executive Playbook — for leaders

11 strategic modules. Business case, ROI models, MCP servers, governance frameworks (ISO 42001, NIST AI RMF), and a 90-day adoption roadmap. No engineering background required. Built for PMs, managers, and technology leaders running the rollout.

Why now

Software teams using AI coding assistants report ~26% more tasks completed per sprint (GitHub / Microsoft / Accenture field study, 2024), with controlled benchmarks showing up to 55% faster completion on focused coding tasks. Industry estimates project 40% fewer defects and 2× feature throughput — but those gains only materialise with the right process framework. AI-SDLC is that framework. Gartner projects 75% of enterprise engineers will use AI coding assistants by 2028; more recent estimates put it at 90%. The question stopped being whether to adopt — it's whether to adopt with discipline or with chaos.

~26%
More tasks per sprint
40%
Fewer escaped defects
Feature throughput
~50%
Faster onboarding

The three pillars

AI-SDLC stands on three independent capabilities. Take any one away and the other two collapse into noise — AI agents without specs produce drift; specs without agents are paperwork; agents and specs without MCP work blind.

AI coding agents

Claude Code, Gemini Code Assist, Cursor — pair programmers that never sleep. They generate code, write tests, review PRs, and explain decisions. The developer-productivity pillar.

Spec-driven planning

OpenSpec or equivalent. Before any AI writes code, the team writes a spec. The discipline: no code without a proposal, no proposal without a spec. The quality-and-governance pillar.

MCP servers

Model Context Protocol gives agents secure access to the tools developers would otherwise reach for manually — Jira, GitHub, Slack, databases, cloud consoles. Without MCP, AI works blind. The context-and-integration pillar.

What AI coding assistants can — and cannot — do

AI-SDLC isn't about building agents — it's about engineers using off-the-shelf coding assistants (Claude Code, Gemini Code Assist, Cursor) inside an SDLC that stays governable. Setting expectations is half the battle: the line between “the assistant does it” and “the developer does it” is what makes the whole methodology work.

Coding assistants CAN

  • Write boilerplate and repetitive code instantly
  • Generate unit tests from specifications
  • Explain complex legacy code in plain English
  • Review PRs for common errors and anti-patterns
  • Create API documentation automatically
  • Refactor code to follow new patterns
  • Generate SQL queries from natural language
  • Create Docker configs and CI pipelines

Coding assistants CANNOT (without guardrails)

  • Know your business rules without context
  • Guarantee security compliance independently
  • Ensure consistency across a large codebase
  • Understand implicit team conventions
  • Know which external systems they're allowed to call

This is precisely why MCP servers and OpenSpec exist — they give the assistant the context it needs and the guardrails it must follow.

Where the engineering goes

Spec-driven design with OpenSpec

Specs first, code second. OpenSpec (the @fission-ai/openspec npm package) maintains a spec.md as the source of truth for canonical fields — names, types, business meaning — enforced across every repo. The flow: AI drafts a proposal with /opsx:propose, a human reviews it, /opsx:apply lands the changes, and CI runs /opsx:verify to catch any code that drifts from the spec. Cross-service drift — the API saying debtorName, the database debtor_name, the frontend debtorNameNorm — doesn't survive into integration.

OpenSpec spec.md /opsx:verify canonical fields

IDE intelligence + agentic workflows

Gemini Code Assist for in-IDE completions and Agent Mode multi-file refactors; Claude Code for read / write / verify loops over the codebase. Both reason over the same project context — CLAUDE.md, GEMINI.md, the spec, the platform state — using the RCTF prompt framework, so output stays consistent across engineers and across phases.

Gemini Code Assist Claude Code RCTF context files

MCP servers wrapping the platform

The platform's REST surface — searchFilings, calculateRiskScore, getEntityProfile — exposed behind an MCP server so any agent on any client calls it through the same contract. Same tool schemas the IDE assistants and the headless review agents use. Cheaper than re-integrating against every new tool that shows up.

MCP servers tool schemas Spring Boot REST OAuth2

Responsible AI & adoption

Bias audits on risk scores, explainability for adverse-action decisions, audit trails that satisfy EU AI Act requirements for credit systems, and adoption metrics that go into the same dashboards as the rest of the pipeline. Compliance is a phase of the loop, not a sign-off at the end.

bias audits adverse-action trails EU AI Act adoption metrics

The risks — drift and vibe coding

Two failure modes are common enough to deserve their own names. Both are governance problems, not model problems.

Spec drift

Ten developers each using AI independently will generate ten slightly different interpretations of the same business rule. The API ends up with debtorName, the database with debtor_name, the frontend with debtorNameNorm — integration breaks at the seam. Spec-first development is how you unify those interpretations before they get to the seam.

Vibe coding

Accepting AI output without reading it, because it looks plausible. The most common failure mode in AI-assisted development — produces security vulnerabilities, subtly wrong business logic, and technical debt that compiles and passes tests but doesn't actually meet the requirements. Prevention: review standards that require reviewers to understand AI-generated code (not just check that the tests pass), automated spec-compliance checks in CI, and a team culture where “I asked AI and accepted its output” is not an acceptable answer.

Agentic AI

An AI agent isn't a chatbot. It's your code using an LLM as a decision-making brain, calling tools, and looping until the task is done. Not autonomous AI running on its own. Not “ChatGPT with extra steps.” A program where the reasoning lives in the model and the structure lives in the code — tools you define, guardrails you set, stop criteria you write down.

Building production agents is what I spend most of my evenings and weekends on. The course below is the artefact — 22 modules and 5 capstones, anchored on a UCC Lien Risk Intelligence platform.

Agentic AI course

22 modules + 5 capstones. From the LLM mental model through tools, memory, planning, guardrails, observability, and deployment — ending in a full production agent system.

Claude Code Mastery — deep dive

A focused deep dive on a single coding agent. Skills, plugins, hooks, slash commands, MCP integration, sub-agents, headless CI mode — the tooling track for engineers who want to get Claude Code earning its keep at production scale.

Three approaches to one problem

Take any business question — say, “Is Acme Corporation likely to become delinquent on secured loans in the next 12 months?” You can solve it three ways with the same data and the same ML model:

Script

Pickle a RandomForest classifier. You compute the six features by hand, hand them to predict_delinquency(), get back a probability and a label. Fast (milliseconds), reproducible — and totally inert. No data fetch, no explanation, no follow-up.

FastAPI wrapper

Same pickle, wrapped in a REST endpoint. POST {"company_name":"Acme"}, the server runs a hardcoded query, returns rigid JSON. Better — auto-fetches data. Still inflexible: ILIKE 'Acme' misses filings under ACME CORP and ACME CORP DBA ROADRUNNER SUPPLIES.

Claude agent

The same pickle is now one tool among threesearch_filings, predict_delinquency, get_filing_details. The agent reasons about name variations on its own, drills into the riskiest filing, calls the ML model, and writes a narrative report citing actual filing numbers.

The infrastructure is identical in all three. The ML model doesn't move. What changes is who decides what to do once a request arrives.

The three-layer stack

The cleanest way to see what an agent adds is as three layers. Two of them are the same as the FastAPI version:

L1
Infrastructure FastAPI · Docker · HTTP · auth · rate limits same as before
L2
Capabilities search_filings() · predict_delinquency() · the ML model same as before
L3
Intelligence Claude reasoning · planning · synthesis · explanation NEW with agents

Most APIs in five years will still be FastAPI. Most ML models will still be pickle, ONNX, or TensorFlow files. The change is Layer 3 — the reasoning that decides how to use the tools and models below it. Agents don't replace ML; they put a reasoning layer on top.

The seven building blocks

Every production agent, no matter how simple or complex, is built from the same seven components. Take any one away and the agent is incomplete.

Brain

The LLM. Reads input, reasons, decides what to do next. Without it: static rules and pattern matching.

Tools

APIs, databases, files. The agent's hands. Without them: it can only respond from training data.

Memory

Conversation history, RAG. Without it: the agent forgets between turns and asks “who are you?” every message.

Plan

Decompose tasks, decide execution order. Without it: the agent only handles one-step requests.

Guardrails

Validate inputs, check outputs, escalate when needed. Without them: every wrong call is a production incident.

Eyes

Observability — logs, traces, telemetry. Without them: you can't debug and you can't trust the output.

Home

Where the agent runs — container, function, server. Without it: it's a script on someone's laptop.

The five lifecycle stages

Building an agent isn't a single checkpoint — it's a sequence with concrete artefacts at each step.

1
Design — what the agent does, what tools it gets, what its stop criterion is.
2
Build — wire up the brain, tools, and memory; get the loop running end-to-end.
3
Protect — guardrails, input validation, output checks, escalation paths.
4
Observe — traces per run, eval suites, failure-mode dashboards.
5
Deploy — production hosting, rate limits, model routing, on-call.

When NOT to use an agent

Agents are slower and more expensive than scripts. Use them where the path through the problem is open-ended — multi-step reasoning, name variations, judgement calls. Don't use them for:

  • Deterministic problems with known inputs and outputs — a function call is cheaper.
  • Latency-critical paths — an agent's loop adds seconds, not milliseconds.
  • Workflows where every decision can be hardcoded for less effort than it would take to set up the tooling, evals, and traces.

Working examples of all of the above — including the seven-block stack wired into a real system — are in the course. Capstone 1 is a one-tool agent. Capstone 5 is the full production system: planning, memory, guardrails, human oversight, model routing, eval suite, deployment.