N Days_With_Machine_Learning (Part1 )

Image for post
Image for post

In This Blog , I will write About my Activity to Learn and Master the Machine Learning filed , It will takes from me N Days , and it will be divided into 6 Parts (Data Preprocessing , Classification , Regression , Clustering , Artificial Neural networks , Reinforcement learning )

What is Machine learning ?

Image for post
Image for post

let’s Start With Data-Preprocessing

The Evolution of Aritifical Intelligence and Machine learning is related to the dispoiniblity of data which are the critical point with it we can develop machine learning models with high accuracy

Image for post
Image for post

Our Mission is to Give the Machines access to the data and let themselves.

But Even we have Good Data , we neeed to check that it is in a useful scale, format and even that meaningful features are included .

That’s we called “Data Preprocessing”.

Image for post
Image for post

Before we started coding ,you need to install these necessary python libraries :

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object…

Image for post
Image for post
Numpy

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Image for post
Image for post

Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms

Image for post
Image for post

scikit-learn is a Python module for machine learning

Image for post
Image for post
scikit learn

Try To Install these packages , By Following these Commands

(Only For Linux User )

sudo apt-get update
sudo apt-get -y install python-pip
sudo apt-get install python3-matplotlib
sudo pip3 install numpy
sudo pip3 install pandas
sudo pip3 install scipy
sudo pip3 install -U scikit-learn

Now let’s start coding

Image for post
Image for post

the First step is to import the libraries

# Imporitng the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Import the Dataset

# Importing the dataset
dataset = pd.read_csv('Data.csv');
Image for post
Image for post
data.csv

Choose Which Columns we will Work with it

X = dataset.iloc[:,:-1].values
Y = dataset.iloc[:,3].values
print(X)
print(Y)
Image for post
Image for post

Taking Care of Missing Data

Because Data can have missing values like for Our Example [‘Germany’ 40.0 nan] the vlaue nan for the Germany Customer , So we Need to deal with it Bu using :

class sklearn.preprocessing.Imputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True)[source]

Python class to complete the missing values , you can read about it

from sklearn.preprocessing import Imputer 
imputer = Imputer(missing_values='NaN',strategy='mean',axis=0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3]= imputer.transform(X[:,1:3])
print(X)

Encoding Ctagroical data

class sklearn.preprocessing.OneHotEncoder(n_values=’auto’, categorical_features=’all’, dtype=<class ‘numpy.float64’>, sparse=True, handle_unknown=’error’)[source]

Python class to encode categorical integer features using a one-hot aka one-of-K scheme , you can read about it

from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
le = preprocessing.LabelEncoder()
enc = OneHotEncoder(categorical_features=[0])
X[:,0]= le.fit_transform(X[:,0])
X = enc.fit_transform(X).toarray();
Y = le.fit_transform(Y)
print(X)
print(Y)

Splitting the Data into Training Set and Testing Set

from sklearn.model_selection import train_test_split
X_Train ,X_Test , Y_Train,Y_Test= train_test_split(X,Y, test_size=0.2,random_state=0)
print(X_Train)
print(Y_Train)
print('**************testing data**********')
print(X_Test)

We need to split the data into “Training set” and “Testing set”

“Training data sets are sets on which you train your machine i.e algorithm to form relationships between variables”.

“Testing data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on”.

Image for post
Image for post

Feauture Scalling

Feature scaling is a general trick applied to optimization problems , in Our Case it will makes the Values be within the range , As a result it will Speeds up the calculation because number of calculations require will be less

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_Train)
X_Test =scaler.transform(X_Test)'''
Image for post
Image for post

I just Finished my First Step “Data Preprocessing” , I hope that you understand this Step Before we start with Developping our Machine learning models and remember

Develop a passion for learning. If you do, you will never cease to grow.

Image for post
Image for post

You Can fin the Full code source Here :

Follow me in Twitter for code Updates

Thanks For Your FeedBack

Image for post
Image for post

Written by

<script>alert('try your best')</script>

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store