01 — Introduction

About Me

Gavrish B
2025 VIT Chennai

Hi, I'm Gavrish B — a Data Analyst & AI/ML Engineer graduating from VIT Chennai with a B.Tech in AI & ML (CGPA 7.92).

I build end-to-end financial data pipelines, multi-source reconciliation tools, and machine learning models that solve real business problems. My work spans asset management data workflows, NLP-powered sales analytics, and computer vision systems — all with a focus on production-quality, well-tested code.

Currently seeking Data Analyst roles in financial services and asset management, with experience at Feynn Labs, Acmegrade, and SystemTron across ML and data engineering.

Python SQL · MySQL Power BI ETL Pipelines Reconciliation TensorFlow Azure AI NLP Pandas · NumPy Tableau

02 — Technical Stack

Skills & Tools

🐍
Python
PandasNumPy SQLAlchemyyfinance
🗄️
SQL · MySQL
Window FunctionsCTEs ViewsJoins
⚙️
ETL Pipelines
ExtractionTransform SLA LoggingScheduling
🔍
Data Reconciliation
Multi-sourceBreak Detection Root CauseUAT
Data Quality & UAT
Pytest22 Test Cases Sign-off Reports
📊
Excel · openpyxl
VBAPivot Tables Automated Reports
🧠
TensorFlow · Keras
CNNRNN/LSTM Transfer LearningFine-tuning
🔥
PyTorch
AutogradCustom Datasets DataLoadertorchvision
⚗️
Scikit-learn
XGBoostRandom Forest SVMPipelines
💬
NLP · Transformers
BERTHuggingFace TokenizationSentiment
Generative AI · LLMs
Prompt EngineeringRAG LangChainOpenAI API
👁️
Computer Vision
YOLOv5/v8OpenCV Object DetectionSegmentation
☁️
Azure AI · Vertex AI
Azure AI-900AutoML Vertex AI Pipelines
📐
MLOps · Experiment Tracking
MLflowWeights & Biases Model Registry
🔢
Feature Engineering · EDA
PandasSeaborn PlotlyCorrelation Analysis
📈
Power BI
DAX5-Page Dashboards Slicers
📉
Tableau
DashboardsStory Points
🏦
Financial Analytics
ReturnsVolatility RSIDrawdown
🔢
Statistical Analysis
Hypothesis TestingCorrelation
🔗
Data Modelling
Star SchemaDim/Fact Tables
📌
Benchmark Data
Index LifecycleBloomberg Static Data
🐍
Python
OOPDecorators GeneratorsAsync
FastAPI · Flask
REST APIsPydantic SwaggerMiddleware
🐳
Docker · Containerisation
Dockerfiledocker-compose Image Layers
🔀
Git · Version Control
Git FlowPR Reviews GitHub ActionsCI/CD
🧪
Testing · TDD
PytestUnit Tests Integration TestsUAT
☁️
Cloud Platforms
AzureGoogle Cloud AWS Basics
🗄️
Databases
MySQLPostgreSQL SQLiteSQLAlchemy ORM
🌐
Web · Scripting
HTML · CSS · JSBeautifulSoup RequestsSelenium
📋
Dev Practices
Clean CodeSOLID Code ReviewDocumentation

Certifications

03 — Work

Featured Projects

01
ML Platform — AutoML + Model Registry
Production-grade MLOps platform built from scratch. TPE-based AutoML engine across 6 algorithm families — 50 trials in under 3 minutes. MLflow-inspired Model Registry with semantic versioning, champion/challenger gates, SHA256 integrity checks, and PSI/KL/KS drift detection. 36/36 tests passing.
PythonScikit-learn MLflowPytest Bayesian OptimisationFeature Store
ML · AI
02
RAG-Powered Research Assistant
Hybrid retrieval pipeline combining dense FAISS vector search with sparse BM25, fused via Reciprocal Rank Fusion for improved recall. FastAPI service with streaming, batch indexing, and source-grounded generation with citation extraction. Achieves 0.70+ confidence and sub-2s response time.
PythonFAISS BM25FastAPI HuggingFaceLangChain
ML · AI
03
Financial Index ETL Pipeline
End-to-end ETL pipeline ingesting 2 years of OHLCV market data across 7 global indices (~5,000+ daily records). 5 automated data quality checks, SLA breach logging, upsert logic, 6 analytical SQL views powering Power BI dashboards, and a 4-sheet Excel reconciliation report per run.
PythonMySQL SQLAlchemyPandas Yahoo Finance APIPower BI
Finance
04
Multi-Source Reconciliation & Root Cause Analysis Tool
Data reconciliation engine cross-validating two vendor feeds across 15 financial assets via outer join logic. Categorises 6 break types (PRICE_BREAK, MISSING, STALE, VOLUME), generates automated investigation playbooks, and delivers a 5-sheet colour-coded Excel audit report. 10 Pytest tests, 100% pass rate.
PythonPandas SQLPytest openpyxlRoot Cause Analysis
Finance
05
UAT Test Framework for Financial Data Pipelines
22-test automated UAT framework gating staging → production promotion across 6 quality dimensions: completeness, accuracy, timeliness, format, integrity, and regression. 6 defect injection scenarios simulate production failures. Outputs a 4-sheet Excel sign-off report with APPROVED/BLOCKED decision badges.
PythonPytest Pandasopenpyxl Data QualityUAT
Finance
06
Resume Strength Analyser
NLP-powered tool that scores resumes against job descriptions using keyword extraction, semantic similarity, and ATS compatibility checks. Provides section-by-section feedback with actionable improvement suggestions and a ranked score out of 100.
PythonNLP BERTScikit-learn StreamlitspaCy
ML · AI
07
Yield Prediction with Sentinel Imagery
Hybrid yield prediction model integrating Sentinel-1 SAR and Sentinel-2 optical satellite data to analyse soil moisture and NDVI vegetation indices for precision agriculture. Spiking Neural Network (SNN) built on neuromorphic computing principles for efficient spatio-temporal learning, achieving 80% prediction accuracy.
PythonSNN Sentinel-1/2NDVI NumPyRemote Sensing
ML · AI
08
Cat vs Dog Classifier
Binary image classification system comparing neural network vs traditional ML pipelines. Neural network achieved 83% validation accuracy with 40% faster inference. Built on 5,000+ annotated images with a custom data augmentation pipeline improving generalisation by 12%.
PythonTensorFlow CNNOpenCV Scikit-learnData Augmentation
Computer Vision
09
Asset Class Analytics Dashboard
5-page Power BI dashboard tracking 15 instruments across Equity, Bond, Commodity, Crypto, and FX asset classes. Features RSI signals, drawdown analysis, SMA overlays, base-100 normalised performance comparison, and 7 optimised SQL views for sub-10s query response.
Power BIDAX MySQLPython Financial Analytics
Finance
--
NLP-Powered Sales Conversion Analyser
Sentiment analysis on sales call transcripts using BERT fine-tuning. Achieved 20% uplift in conversion rate prediction accuracy for Feynn Labs. Automated report generation reduced analyst time by 25%.
PythonBERT NLPTensorFlow Scikit-learn
ML · AI
--
Pedestrian Detection System
Real-time pedestrian detection using YOLOv5 and OpenCV. Deployed at SystemTron with 90% RF accuracy and 83% neural network accuracy. Used for smart city safety monitoring applications.
PythonYOLOv5 OpenCVComputer Vision
ML · AI

04 — Career

Work Experience

Jun 2023 – Aug 2023
Machine Learning Intern
Feynn Labs Remote
  • Built NLP pipeline using BERT fine-tuning on sales call transcripts, achieving 20% uplift in conversion rate prediction accuracy.
  • Automated weekly analytics reporting, cutting manual analyst processing time by 25% through Python scripting and scheduled job execution.
  • Designed and implemented sentiment analysis module classifying customer intent across 3 categories with 84% accuracy on held-out test set.
  • Collaborated with 4-person cross-functional team to deploy model to staging environment, writing documentation and runbooks for handoff.
PythonBERT NLPTensorFlow Sentiment AnalysisPandas
Sep 2023 – Oct 2023
ML & Data Engineering Intern
SystemTron On-site
  • Developed real-time pedestrian detection system using YOLOv5 and OpenCV achieving 90% accuracy via Random Forest ensemble.
  • Built and compared neural network vs traditional ML pipeline — neural network achieved 83% accuracy on validation set with 40% faster inference.
  • Processed and annotated 5,000+ image dataset, implementing data augmentation pipeline that improved model generalisation by 12%.
  • Delivered live demo to technical stakeholders and wrote detailed model card documentation for production handoff.
PythonYOLOv5 OpenCVRandom Forest Computer Visionscikit-learn
Jul 2022 – Sep 2022
Machine Learning Intern
Acmegrade Remote
  • Implemented supervised learning models for agricultural yield prediction using multi-source time series data from weather APIs and crop sensors.
  • Performed end-to-end data wrangling across 3 disparate data sources — applied quality checks, documented SOPs, and delivered clean feature-engineered dataset.
  • Ran exploratory data analysis and produced visualisation report that identified 3 key correlates of crop yield variance, presented to team lead.
  • Wrote modular, well-commented Python code following team coding standards — all code reviewed and merged without rework.
PythonScikit-learn PandasEDA Time SeriesData Quality

05 — Let's Talk

Open to
Opportunities

Looking for Data Analyst, ML Engineer, or BI Analyst roles in financial services, asset management, or tech. Based in India — open to Bengaluru, Chennai, Hyderabad, Pune and remote.