N Days_With_Machine_Learning (Part1 )
--
In This Blog , I will write About my Activity to Learn and Master the Machine Learning filed , It will takes from me N Days , and it will be divided into 6 Parts (Data Preprocessing , Classification , Regression , Clustering , Artificial Neural networks , Reinforcement learning )
What is Machine learning ?
“Machine Learning is an application of Artificial Intelligence and is revolutionizing the way companies do business”
let’s Start With Data-Preprocessing
The Evolution of Aritifical Intelligence and Machine learning is related to the dispoiniblity of data which are the critical point with it we can develop machine learning models with high accuracy
Our Mission is to Give the Machines access to the data and let themselves.
But Even we have Good Data , we neeed to check that it is in a useful scale, format and even that meaningful features are included .
That’s we called “Data Preprocessing”.
Before we started coding ,you need to install these necessary python libraries :
scikit-learn is a Python module for machine learning
Try To Install these packages , By Following these Commands
(Only For Linux User )
sudo apt-get update
sudo apt-get -y install python-pip
sudo apt-get install python3-matplotlib
sudo pip3 install numpy
sudo pip3 install pandas
sudo pip3 install scipy
sudo pip3 install -U scikit-learn
Now let’s start coding
the First step is to import the libraries
# Imporitng the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Import the Dataset
# Importing the dataset
dataset = pd.read_csv('Data.csv');
Choose Which Columns we will Work with it
X = dataset.iloc[:,:-1].values
Y = dataset.iloc[:,3].values
print(X)
print(Y)
Taking Care of Missing Data
Because Data can have missing values like for Our Example [‘Germany’ 40.0 nan] the vlaue nan for the Germany Customer , So we Need to deal with it Bu using :
class sklearn.preprocessing.Imputer
(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True)[source]
Python class to complete the missing values , you can read about it
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN',strategy='mean',axis=0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3]= imputer.transform(X[:,1:3])
print(X)
Encoding Ctagroical data
class sklearn.preprocessing.OneHotEncoder
(n_values=’auto’, categorical_features=’all’, dtype=<class ‘numpy.float64’>, sparse=True, handle_unknown=’error’)[source]
Python class to encode categorical integer features using a one-hot aka one-of-K scheme , you can read about it
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoderle = preprocessing.LabelEncoder()
enc = OneHotEncoder(categorical_features=[0])X[:,0]= le.fit_transform(X[:,0])
X = enc.fit_transform(X).toarray();Y = le.fit_transform(Y)
print(X)
print(Y)
Splitting the Data into Training Set and Testing Set
from sklearn.model_selection import train_test_split
X_Train ,X_Test , Y_Train,Y_Test= train_test_split(X,Y, test_size=0.2,random_state=0)print(X_Train)
print(Y_Train)
print('**************testing data**********')
print(X_Test)
We need to split the data into “Training set” and “Testing set”
“Training data sets are sets on which you train your machine i.e algorithm to form relationships between variables”.
“Testing data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on”.
Feauture Scalling
Feature scaling is a general trick applied to optimization problems , in Our Case it will makes the Values be within the range , As a result it will Speeds up the calculation because number of calculations require will be less
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_Train)
X_Test =scaler.transform(X_Test)'''
I just Finished my First Step “Data Preprocessing” , I hope that you understand this Step Before we start with Developping our Machine learning models and remember
Develop a passion for learning. If you do, you will never cease to grow.
You Can fin the Full code source Here :
Follow me in Twitter for code Updates
Thanks For Your FeedBack