GenAI Engineer
Senior Data Scientist

Senior Data Scientist with 7+ years of experience building scalable ML, NLP, and GenAI/LLM systems, with end-to-end ownership from data pipelines to production deployment. Strong in large-scale information retrieval, cloud architecture, and technical leadership, translating complex infrastructure into impactful AI solutions.

+91-8437003676 sagardhiman033@gmail.com
GenAI + LLM ML Graph-Augmented RAG
AWS + Azure Data Pipelines
Sagar Dhiman

Focused on shipping AI products, not demos.

LLM-Assisted Engineering: designed, coded, and shipped by me end-to-end.

Technical Expertise

AI, ML & LLM Systems

LLMs, Graph-Augmented RAG, NLP pipelines, semantic search, recommendation systems, anomaly detection.

Data Engineering & Cloud

Data pipelines, ELT/ETL, clickstream/time-series processing, AWS, Azure, Snowflake, Redshift.

Programming & Platforms

Python, SQL, PySpark, Docker, Airflow, Neo4j, ElasticSearch, MongoDB, PostgreSQL.

Project Highlights

Graph RAG
Graph-Augmented RAGHaven Safety engagement
Sensor AI
Sensor AnalyticsSemantic search + forecasting
Data pipeline
Data Engineering PipelinesAWS S3 + Glue + Snowflake

Work Experience

Haven Safety (Backed by Andrew Ng's FUND.ai) | Aug 2025 - Present (via Tatras Data)
  • Client Engagement: Haven Safety (FUND.ai)
  • AI Leadership: Lead AI & Data Engineer since Aug 2025 for safety intelligence copilot.
  • Graph-Based AI Reasoning: Graph-augmented LLM systems for incident analysis.
  • LLM-Driven Pipelines: Extraction and graph enrichment for explainable intelligence.
Tatras Data | Senior Data Scientist (May 2018 - Present)
  • Chatbot Development: Designed and deployed multi-RAG chatbots using graph-based and semantic similarity retrieval for complex, multi-source enterprise datasets.
  • Sensor Data Analytics: Developed semantic fingerprinting methods for sensor historical data, enabling advanced semantic search and time series forecasting.
  • Project Leadership: Led a team to scrape and analyze global cuisine data. Developed ML models to classify cuisine types and standardize item names across vendors.
Tatras Data | Associate Data Scientist
  • Built and optimized ELT pipelines with AWS S3, Glue, and Snowflake, with significant cost savings.
  • Implemented transformer-based retrieval and abstractive/extractive summarization.
  • Developed recommendation systems with Word2Vec, LDA, and Knowledge Graphs.
  • Automated ETL for MongoDB/PostgreSQL; deployed Flask/FastAPI APIs via Docker on AWS.
Tatras Data | Junior Data Scientist
  • Engineered clickstream pipelines with Druid and Kafka for real-time analytics.
  • Built text recommendation systems with TF-IDF, LDA, and Word2Vec integrated with Elasticsearch.
  • Developed keyphrase extraction solutions using unsupervised and custom NER approaches.
Tatras Data | Trainee Data Scientist
  • Designed CNN models for pneumonia detection and YOLO-based license plate recognition.
  • Built duplicate bug detection pipelines with TF-IDF and topic modeling (MAP/Recall optimization).
  • Automated high-volume price comparison scraping workflows.
Mentor | Sabudh Foundation (Jan 2019 - Present)
  • Mentor in NLP, recommenders, unsupervised learning, AI healthcare projects.
  • Guide end-to-end student pipelines and conduct monthly assessments.

Education

B.Tech, Computer Science and Engineering (2014 - 2018)
  • Guru Nanak Dev Engineering College, Ludhiana
12th CBSE (2013 - 2014)
  • Kendriya Vidyalaya

Certifications

  • AWS Certified Cloud Practitioner
  • AWS SageMaker Practical - Udemy
  • Dataiku ML Practitioner, Core Designer, Advanced Designer, MLOps Practitioner
  • Spark and Python for Big Data (PySpark)

Leadership & Responsibilities

  • Teaching Assistant - Indian School of Business (ISB), Mohali
  • Project Lead for multiple enterprise AI initiatives
  • Regular mentor for junior data scientists and interns