In this project we used specific versions of the libraries. Save the following libraries in a text file- requirements.txt
. Install all these libraries using the code python -r pip install requirements.txt
.
In this project, we will walk through an end-to-end machine learning task using the Iris dataset. This comprehensive exercise will cover all stages of a machine learning pipeline, from data exploration to model deployment.
The Iris dataset is a classic dataset in machine learning, widely used for benchmarking classification algorithms. It consists of measurements from 150 iris flowers, with four features- Sepal Length, Sepal Width, Petal Length, and Petal Width. Each sample is labeled with one of three species- Iris-setosa, Iris-versicolor, and Iris-virginica.
Our objective is to build a classification model that can accurately predict the species of an iris flower based on its measurements. We will explore the dataset, perform necessary preprocessing, and select an appropriate classification algorithm to achieve this goal.
Viewing the beggining Dataset: The code iris_df.head()
displays the first five rows of the iris_df
DataFrame, providing a quick overview of the dataset’s structure and the initial entries. We just visualize first 5 samples in the dataset as a table.
sepal.length | sepal.width | petal.length | petal.width | variety | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Setosa |
Checking for Duplicates: The code iris_df.duplicated().sum()
counts the number of duplicate rows in the iris_df
DataFrame, helping identify any redundancy in the dataset that may need to be addressed.
np.int64(1)
In this section we will perform various graphical analysis of features over the classes of target.
In this section various features are analysed in more detail. Presence of outlair and normality of feature distribution will be checked before ML model building.
Based on our data exploration, we will select a suitable classification model. Based on the Exploratory Data Analysis (EDA), we found that petal.length
and petal.width
are the most influential features for determining the variety of the Iris flower. To classify the Iris dataset, several classification models can be employed. In this discussion, we will consider Logistic Regression, K-nn, Support Vector Machine, Decision Tree and Random Forest algorithms.
After training the model, we will evaluate its performance using various metrics such as accuracy and classification report. This will help us understand how well the model is performing and whether any improvements are needed. In this context, the RandomForestClassifier
model is the winner. So we will select and save this model for deployment.
['rf_model.sav']
Finally, we will deploy our trained model using Streamlit
, an open-source framework that allows us to create interactive web applications for real-time predictions. This will enable users to input flower measurements and receive predictions on the species.
To deploy the Random Forest Classifier model using Streamlit
, we’ll need to set up several components for the complete workflow. Here’s a step-by-step guide to create the necessary files:
Streamlit
1. Prepare the Environment: In this step install the streamlit
library using folowing code.
2. Create the source code to load the trained model-model.py
:
This .py
file is used to load the trained model and handle predictions.
3. Create prediction.py
:
This .py
file handles the prediction logic, using the loaded model to make predictions based on input data. The source code for this job is given below.
4. Creating the app.py
:
This is the main Streamlit
application file. It provides the user interface for inputting data and displaying predictions.
5. Run the Streamlit
App:
Navigate to the directory containing your files and run the Streamlit
app using the following command: