ML & AI Enthusiast

Crafting IntelligenceThrough Data & Code

Transforming complex data into intelligent solutions. I specialize in Machine Learning, Deep Learning, and AI Research to build the future of technology.

Core Expertise

AI & Machine Learning Specializations

Transforming complex data challenges into intelligent solutions through cutting-edge AI technologies and innovative approaches.

Large Language Models

Developing and fine-tuning LLMs for specific domain applications, prompt engineering, and context optimization.

Key Technologies

  • GPT architecture implementation
  • RAG systems development
  • Fine-tuning strategies
Deep Learning

Building and optimizing neural networks for computer vision, NLP, and multimodal applications.

Key Technologies

  • CNN & Transformer architectures
  • Transfer learning techniques
  • Model optimization
MLOps & Deployment

Streamlining ML workflows from experimentation to production with robust CI/CD pipelines.

Key Technologies

  • Model monitoring & maintenance
  • Scalable ML infrastructure
  • Containerization & orchestration
Portfolio Showcase

Featured Projects

Innovative AI solutions that push the boundaries of what's possible with machine learning and data science.

SpiceRoute — Home Chef Marketplace
Marketplace
Community Platform

SpiceRoute — Home Chef Marketplace

A platform connecting food lovers with talented home chefs in Berlin, featuring verified chefs and diverse cuisines.

Tech Stack
Next.js (App Router)FirebaseStripeVercel+1
ResumeAI
AI/ML Web Application
Career Tech

ResumeAI

An intelligent web application that helps fresh graduates optimize resumes, analyze job matches, generate cover letters, prepare for interviews, and track applications using AI-driven workflows.

Tech Stack
Next.js (React)FirebaseVercelGemini
ReelNotes
AI/ML Web Application
Productivity Tool

ReelNotes

Turn Instagram Reels into concise, searchable notes using AI — save time and retain knowledge effortlessly.

Tech Stack
Next.js (React)FirebaseVercelGemini
Knowledge Sharing

Latest Insights & Research

Deep dives into AI research, practical tutorials, and insights from the cutting edge of machine learning.

Multi-head latent attention
LLM
Aug 26, 2025
5 min read
Multi-head latent attention
Multi-Head Latent Attention (MHLA) reduces KV Cache size by projecting inputs into a smaller latent space, caching only one compact matrix instead of separate keys and values for each head. Unlike MQA or GQA, it still generates distinct keys and values per head using head-specific projections, preserving performance while cutting memory use by over 50x in models like DeepSeek.
Deep Seek
NLP
Attention
Read Article
Grouped-Query Attention (GQA)
Transformers
Aug 2, 2025
5 min read
Grouped-Query Attention (GQA)
Grouped-Query Attention (GQA) reduces memory usage by sharing keys and values among groups of attention heads, rather than across all heads (as in MQA) or none (as in MHA). It offers a trade-off: smaller KV Cache than MHA and better performance than MQA, balancing efficiency and representational power. Models like Llama 3 use GQA for this reason.
Transformers
LLM
Deep Seek
Read Article
Let's Build Together

Ready to Transform YourData Into Intelligence?

Let's collaborate on innovative AI solutions that drive real impact. From concept to deployment, I'll help you harness the power of machine learning for your unique challenges.

    Inflexion Assistant
    💡 Serving resume facts, project bragging rights, and occasional dad jokes on demand.
    🤖 Yo! I'm Inflexion Assistant, your virtual guide to Shubham's AI empire. Skills, projects, LLM sorcery? I've got answers. Ask me anything (no CAPTCHA, promise).