6 minute read

Introduction

  • This page list all my github repo (private + public).
  • These github repositories are related to all my projects, consultings, courses and POC (proof of concepts), technology explorations, github opensource contributions etc.
  • These projects are related to Project Management, AI/ML, LLM, NLP, Cloud Computing, Software Architectures and Solutions,
  • The purpose of this listing is dual, to help other’s knowing what is possible and what I have explored. Second, to remind myself what I already have explored and worked vs what didn’t work.
  • Bigdata: Hadoop, Hive, Spark

Tech Skills

  • LLM Expertise: Prompt Engineering, Finetuning & Deployment (chatGPT, GPT4, Bard, LLaMA, LaMDA, PaLM, ).

  • ML Model Development: Feature Engineering, Tuning, Evaluation, Cross-Validation, Classical ML, NLP metrics, egression/Classification/Clustering, Ensemble Trees, Decision Tree, Random Forest, SVM.

  • AutoML: Automated ML (PyCaret, TPOT).

  • MLOps/DevOps:

  • Deep Learning / NLP & Embedding: Huggingface, RNN, LSTM, GRU, Transformers, BERT, FastText, NLTK, SpaCy, Word Embedding, Keras, PyTorch, TensorFlow, OpenAI, Embedding Transfer, CV model evaluation, CNN, YOLO

  • Big Data & Cloud: Hadoop, Spark, PySpark, Kafka, NoSQL (Cassandra, MongoDB)

  • Cloud Platforms: AWS, GCP, Azure, AWS Sagemaker, Aure AutoML, VertexAI

  • ML Frameworks: Tensorflow, Tensorflow lite/LiteRT, Tensorflow.js, Pytorch

  • Data Visualization: PowerBI, Tableau, Plotly, Seaborn, Matplotlib,

  • Mobile/Web App Dev: Flask, Gradio, Streamlit, Android Studio, Flutter

  • Programming Laguages: Python, R, Package Managers (pip, conda, npm), Dart

  • Markup Language: Markdown, LaTex, HTML/CSS

  • Statistics: Descriptive/Inferential Statistics, Prescriptive Statistics in AI.

AI/ML - Industries - Developed/ Created/ Expanded work

Projects in this section are listed according to Industry/Business Domain.

Agri

Airlines

Flightdelay-Analysis-Bigdata

Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations.

In this project a folder “\server\airlines” on the server has hundreds of files which contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years.

Activities (Pipeline) in project:

  • Creating hive table (for storage) from the external files
  • Create partition table schema
  • Parition hive table based on the year and putting data in partition table.
  • Performing sql querries on the partitioned table

Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.
My article on Hive
Github Code

BFSI

Credit-Fraud-Detection

DoeJones-Prediction-with-News

Loan-Approval

eCommerce

Amazon-Sentiment Analysis

Bigdata-AmazonReviews

Economics

Economy-Analysis

Prosperity-Clustering

Education

Electronics

Hand-Gesture-Recognition

Energy

UK-Energy-Consumption

Entertaintment

Movies-Recommendations

Health

Breast-Cancer-Prediction

Chest-XRay

Covid-worldwide-EDA

India-Covid-Graphs

Malaria-Detection_dep

pnemonia_prediction

Hospitality

Restaurant_Sales_Order_Forcasting

Infra

AirQuality-Prediction

House-Price-Prediction_dep

House-Price-Prediction_Docker

House-Prices-KCH

Surprise-House-Pricing

Law+Justice

Media+Pub

Fakenews-Detection

Olympic-QA-System-with-GPT

HBQAS

NewsClassification-20Groups

Podcast-Transcription

SDSHL

SpamFilter

Toxic-Comment

Twitter-Sentiment-Analysis

YELP-Review-Prediction

Misc

Restaurants

FoodDemand-Forcast

Tools-and-Food-Gradient-Identifcation

Retail

Online-Retail-Customer-Clustering

Sales

CarPrice

CarSales

Lead-Conversion

Sanskrit

Telecom

Telcom-Churn

Travel+Logistic

Bigdata-pySpark-NYC-Parking

Driver-Availablity-Prediction

Uber-Cancellation

Vehicle-Classification

Vehicle-Tracking

README.md

AI/ML - Technology Stack - Developed/ Created/ Expended work

Projects in this section are listed according to Technology/Tech Product/POC

0-Experiments

Analytics

bokeh

pandas

plotly

PyGWalker

tableau

Audio

Speech Recogntion

Bigdata

Bigdata-HiveScoop

Bigdata-mySQL

Cloud

AWS-Amazon-Bedrock-for-Serverless-LLM

AWS-SageMaker

GCP

CV

Flower-Prediction

ImageAugmentation

ImageProcessing

MNIST-Experiments

Object-Detection-InBrowser

DE

DataCleaning

Datacollection

PyWebScrapping

Machine Learning Frameworks

mobilenet-v2

Using deep learning model on mobile. Github Read me

Tensofrlow Lite for Regression

Github Code for App and Model

GAN

MusicGeneration

IOT

BOLTIOT

LLM

finetune-bloom-7b

finetune-llama2

finetune-llama3-8b

huggingface

Langchain

neo2.7b

openAI

openai-quickstart-python

quantization

RAG

GroqCloud

Misc

ML

Classification

Clustering

DataImbalance

ML-Retraining

Regression

ROC

MLOps

ML-Pipelines

naptune

NLP

embedding

LSTM.ipynb

NLP

NLP-Concepts

NLP-Hindi-Bible

NLP-Plugin20Event

NLP-rasa

NLP-SanskritTrans

Python-Automation

R-Projects

RL

Stats

Tech-Products

Hive

mongodb

PowerBI

TensorFlow-ImageRecognition

Timeseries

TS-multivariate

TS-Smoothing

Transfer-Learning

Utils-JypterNB

Readme.md

AI/ML - Forked

100Days-ML

automl

chroma

diffusers

evalml

gcp-python-docs-samples

GFPGAN

google-gemini-cookbook

intel-scikit-learn-intelex

langchain

langgraph

Learning-Pandas-Second-Edition

LeetCode-js

microsoft-generative-ai-for-beginners

ML+DL-Code-for-my-YouTube-Channel-Rohan

packages

stanford_alpaca

tensorboard

tensorflow

tensorflow-examples

tfjs-examples

Visualization

viz-github-repo

README-forked-repo.md

Management

Management-Main

  • 11-PMO
    1. 00-General
    2. 01-Chemfab-PMO
    3. 02-ISCON-PMO
    4. 03-Tagros-PMO
    5. 04-FFI-PMO
    6. 05-BFL-PMO
    7. 06-TEAM-PMO
    8. 07-SignitySolutions
  • 12-Projects-PM
    1. 01-Vikram-Solar-PMF
    2. 02-FFI-Agile-Consulting
    3. 03-AllSysServices-PMI-ACP
    4. 04-Astrowix-PMI-ACP
    5. 05-Colossal-Hibu-PM
    6. 06-TGroup-PMP
    7. 07-BirlaSoft-SageTech-PMI-ACP
    8. 08-Sagetech-Project-Estimation
    9. 09-VGL-PM-2days
    10. 10-CompetenceCurve-FTM
    11. 11-Sanofi-PM
    12. 12-Konsberg-Scrum-Agile
    13. 13-HRLEHR-Dubai
    14. 14-SEO-PMLOGY
    15. A01-ContractManagement
    16. P01-PRINCE2
  • 12-Project-NGO
    1. S01-Rajiv-Malhotra
    2. S02-YFS
    3. S03-HSP
    4. S04-RKM-Ashram
    5. S05-RKM-Kankhal
    6. S06-RKM-Trivendram
  • 14-Process-Courses
    1. Process-CMMI
    2. Process-ISMS
    3. Process-ISO
    4. Process-SixSigma
    5. Process-ZED
  • 52-Work-PMI-Chapters
    1. 2012 LIMC Application Information
    2. CPC_Presentation_Foundations.ppt
    3. ITnTelecom-Webinars
    4. LIM-Brazil-2011
    5. OPM3-Package
    6. PMBoK-Hindi
    7. PMI-International
    8. PMI-Leadership-Component
    9. PMI-NC
    10. PMI-Team-India
    11. PMIEF
    12. PMIMC-BestPractices
    13. Regional-BP-Task-Force
    14. Work-PMICC
    15. Work-PMIMC

Management-PM-Courses

  1. PM-Agile
  2. PM-Customized
  3. PM-EPM
  4. PM-EVM-MSP
  5. PM-Microsoft-Project
  6. PM-Misc-Training
  7. PM-PMP-v5
  8. PM-PMP-v6
  9. PM-PRINCE2
  10. PM-RMP
  11. PM-SharePoint
  12. PM-SoftwareSizeEstimation

## Management-PMO

  1. PM-Templates
  2. PMO

Management-PMIPrep

Training-Feedbacks

Web+Mobile App Development - POC Work

  1. Android
  2. Falcon_Android
  3. ImageRecognition
  4. Java
  5. nodejs
  6. react

AI/ML Datasets

There is no dearth of datasets but during training sessions when I or my learners need some dataset that we need to struggle for these datasets. Either they are removed ore renamed or internet availablity/restriction etc issue waste lot of time. To avoid that I have created this github repo of datasets. These are for classical machine learning. They are not for deeplearning or LLM, until mentioned specifically.

  1. 50_Startups - 50_Startups.csv
  2. Abalone
  3. Accidental Drug Related Deaths in Connecticut, US
  4. airline-pass-stats.csv
  5. airline-passengers.csv
  6. AirQuality
  7. Amazon Product Reviews
  8. amazon_alexa.tsv
  9. application_train.csv
  10. Autism Screening Adult
  11. Auto MPG
  12. Banknote Authentication
  13. Beijing PM2.5
  14. Bike Sharing
  15. Birmingham Parking Dataset
  16. Blog_Article_Popularity
  17. Blood Transfusion Service Center
  18. Breast Cancer Wisconsin
  19. Car Evaluation
  20. CarPrice.csv
  21. CarPrice_DescribeData.csv
  22. Census Income
  23. childweight_SCA01.csv
  24. Concrete Compressive Strength
  25. Coronavirus
  26. Daily Demand Forecasting Orders
  27. daily-min-temperatures.csv
  28. Default of Credit Card Clients
  29. Dow Jones Index
  30. Echocardiogram
  31. EEG Eye State Dataset
  32. EEG Steady State Evoked Potential Dataset
  33. Energy Efficiency
  34. EU Population Poverty Status Dataset
  35. Fakenames
  36. FB.csv
  37. Fertility
  38. FIFA-Worldcup - World Cup.csv
  39. financial_crime_aylien_news_data.tar.gz
  40. fine_food_reviews_with_embeddings_1k.csv
  41. Flights
  42. Frequent_Names
  43. Glass Identification
  44. Heart Disease
  45. HelpInternational-Country-data.csv
  46. Hepatitis
  47. Hepatitis C Virus (HCV) Classification Dataset
  48. Immigrants
  49. Individual Household Electric Power Consumption
  50. Interstate-94 (I-94) Traffic Volume Dataset
  51. Istanbul Stock Exchange
  52. Liver Disorders
  53. Movie-Rating.zip
  54. Occupancy Detection
  55. OCR-Samples
  56. olympics_qa.csv
  57. olympics_search.jsonl
  58. olympics_sections.csv
  59. Online News Popularity
  60. Online_Retail
  61. pima_indian_diabetes.csv
  62. POIClassification.csv
  63. Population
  64. Portugal 2019 Election Dataset
  65. Qualitative Bankruptcy
  66. random-ocr-images
  67. Real Estate Valuation
  68. Risk Factors for Cervical Cancer
  69. spotfy-2000.zip
  70. Startup_Investment
  71. Suicide
  72. Telecom_Churn
  73. Travel Reviews
  74. Unemployment
  75. US Tuberculosis Dataset
  76. User Knowledge Modeling
  77. Wholesale Customers
  78. Wireless Indoor Localization
  79. README.md

Updated: