N Days_With_Machine_Learning (Part1 )

Rebai Ahmed
5 min readJan 8, 2018

--

In This Blog , I will write About my Activity to Learn and Master the Machine Learning filed , It will takes from me N Days , and it will be divided into 6 Parts (Data Preprocessing , Classification , Regression , Clustering , Artificial Neural networks , Reinforcement learning )

What is Machine learning ?

“Machine Learning is an application of Artificial Intelligence and is revolutionizing the way companies do business”

let’s Start With Data-Preprocessing

The Evolution of Aritifical Intelligence and Machine learning is related to the dispoiniblity of data which are the critical point with it we can develop machine learning models with high accuracy

Our Mission is to Give the Machines access to the data and let themselves.

But Even we have Good Data , we neeed to check that it is in a useful scale, format and even that meaningful features are included .

That’s we called “Data Preprocessing”.

Before we started coding ,you need to install these necessary python libraries :

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object…

Numpy

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms

scikit-learn is a Python module for machine learning

scikit learn

Try To Install these packages , By Following these Commands

(Only For Linux User )

sudo apt-get update
sudo apt-get -y install python-pip
sudo apt-get install python3-matplotlib
sudo pip3 install numpy
sudo pip3 install pandas
sudo pip3 install scipy
sudo pip3 install -U scikit-learn

Now let’s start coding

the First step is to import the libraries

# Imporitng the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Import the Dataset

# Importing the dataset
dataset = pd.read_csv('Data.csv');
data.csv

Choose Which Columns we will Work with it

X = dataset.iloc[:,:-1].values
Y = dataset.iloc[:,3].values
print(X)
print(Y)

Taking Care of Missing Data

Because Data can have missing values like for Our Example [‘Germany’ 40.0 nan] the vlaue nan for the Germany Customer , So we Need to deal with it Bu using :

class sklearn.preprocessing.Imputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True)[source]

Python class to complete the missing values , you can read about it

from sklearn.preprocessing import Imputer 
imputer = Imputer(missing_values='NaN',strategy='mean',axis=0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3]= imputer.transform(X[:,1:3])
print(X)

Encoding Ctagroical data

class sklearn.preprocessing.OneHotEncoder(n_values=’auto’, categorical_features=’all’, dtype=<class ‘numpy.float64’>, sparse=True, handle_unknown=’error’)[source]

Python class to encode categorical integer features using a one-hot aka one-of-K scheme , you can read about it

from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
le = preprocessing.LabelEncoder()
enc = OneHotEncoder(categorical_features=[0])
X[:,0]= le.fit_transform(X[:,0])
X = enc.fit_transform(X).toarray();
Y = le.fit_transform(Y)
print(X)
print(Y)

Splitting the Data into Training Set and Testing Set

from sklearn.model_selection import train_test_split
X_Train ,X_Test , Y_Train,Y_Test= train_test_split(X,Y, test_size=0.2,random_state=0)
print(X_Train)
print(Y_Train)
print('**************testing data**********')
print(X_Test)

We need to split the data into “Training set” and “Testing set”

“Training data sets are sets on which you train your machine i.e algorithm to form relationships between variables”.

“Testing data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on”.

Feauture Scalling

Feature scaling is a general trick applied to optimization problems , in Our Case it will makes the Values be within the range , As a result it will Speeds up the calculation because number of calculations require will be less

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_Train)
X_Test =scaler.transform(X_Test)'''

I just Finished my First Step “Data Preprocessing” , I hope that you understand this Step Before we start with Developping our Machine learning models and remember

Develop a passion for learning. If you do, you will never cease to grow.

You Can fin the Full code source Here :

Follow me in Twitter for code Updates

Thanks For Your FeedBack

--

--

Rebai Ahmed
Rebai Ahmed

No responses yet