ml

Learning Resource

: Course Name :

Machine Learning A-Z™: AI, Python & R + ChatGPT Bonus [2023]

Statistics Foundation 1 Basic

Statistics Foundation 1 Probability

Statistics Foundation 2

Statistics Foundation 3

Python for Data Science essential training Part 1 & Part 2

NLP with Python for Machine Learning essential training

Machine Learning and AI Foundation : Classification Modelling

Data Analysis and Data Classification (CID At bottom)

Azure ML Dvelopement : 1. Basic concept 2. Learning ML Stuido 3. Deploying and Managing Models

: Total 40 :

ML-20
NLP/DL- 10/10 OR 15/5 OR 5/15

: Topics :

ML - pandas numpy scikit matplotlib Seaborn
NLP- all basic libraries and their functions
DL - TBD

: Youtube Channel names :

Campux
Krish naik
(code with Harry,Corey schafer,codebasics,kimerly fessel,edureka)

: Handson 202205 Di :

Data Analysis and Visualizations

: Handson 202305 Di :

Recall,f score, accuracy, precision and one Modeling question / ML models

: Handson 202211 In :

Data Analysis, Visualization and apriori Algorithm

: Handson 202305 In :

Data Analysis, regression, Classification, Clustering, PCA algorithms

regression, classification, clustering, dimentionality reduction, Random forest, KMeans, PCA

Below is the Dataset asked in handson :

https://www.kaggle.com/datasets/elikplim/car-evaluation-data-set/code

: Handson 202308 In :

11 questions in total...all related to one dataset

First question was to get count of target variable categories..like how many values belong to class A, class B etc

Then count the total number of null values

Second question was scatter plot

Third was to create a new column from an existing column then assign values based on some conditions...for eg: assign 'small' for values < 12, medium for 12>values>15, and large for remaining.... basically data discretization

4th question was removing null values, label encoding, scaling data, and storing it into X and y

5th question was splitting into test and train sets

6th was Logistic regression, 7th was Random forest, 8th was gradientBoostingClassifier

In these questions we had to fit the model, create classification report and confusion matrix

9th was choosing the best model out of the three...this was MCQ question...we had to select one option

10th was retraining the gradient boosting classifier model

: MCQ 202305:

This time it was mostly based on machine learning and it's core concept. MCQs were in detail and weightage of numpy and matplotlib questions were less.
Topics ranged from :
Numpy, data visualization using seaborn and matplotlib both, ML concepts , NLP  , regularisation both lasso and ridge along with their penalties 
Stemming in detail from NLP

: MCQ 202308:

Topics ranged from :
Nltk tokenize
Sklearn impute
Scipy linalgo
Numpy arr argument
Matplotlib
Pd dataframe concat, buffer
Pd series
Pd sparse series
Evaluation metrics
Chatbot rasa

: hands on:

Questions were based on confusion matrix but we're situation based you had to analysis which measure of confusion matrix they are talking about

Once you find that then you have to implement that on given data set

6 questions were based on confusion matrix and measures (sklearn.metrics import confusion_matrix)

Last one we had to build a machine learning model ( including pre processing of the data. ) (train test split, regression model creation)

Which was measured on basis of weighted F1 score

: Observation :

Usually they ask numpy and matplotlib mostly but this the focus was on machine learning (ML)

suggestion to do both ML and numpy & data visualization(matplotlib)

: How to prepare :

Before going to ML model : try and practice numpy and pandas learn data manipulation

Then go to model building as it requires preprocessing, feature engineering and feature scaling

For practising model building kaggle is best 

https://www.geeksforgeeks.org/matplotlib-tutorial/

https://www.geeksforgeeks.org/statistics/

https://www.geeksforgeeks.org/pandas-tutorial/

https://www.geeksforgeeks.org/numpy-tutorial/

Confusion Matrix:

https://www.analyticsvidhya.com/blog/2021/07/metrics-to-evaluate-your-classification-model-to-take-the-right-decisions/

https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262

https://medium.com/hugo-ferreiras-blog/confusion-matrix-and-other-metrics-in-machine-learning-894688cb1c0a

https://towardsdatascience.com/understanding-data-science-classification-metrics-in-scikit-learn-in-python-3bc336865019

Data Analysis:

Numpy and Pandas

Data Visualization:

Matplotlib & SeaBorn

Machine Learning:

Scikit Learn, Keras, (TensorFlow, PyTorch)

Data science ebook

https://jakevdp.github.io/PythonDataScienceHandbook/index.html

Syllabus

Data Scraping:

Numpy and Pandas

Data Cleaning:

Matplotlib & SeaBorn

Data Visualization:

Matplotlib & SeaBorn

Model Building :

Scikit Learn & Scipy