Understanding and Implementing an Artificial Neural Network (ANN) from Scratch

Artificial Neural Networks (ANNs) have gained significant popularity due to their ability to model complex relationships in data, making them a powerful tool in machine learning and artificial intelligence. In this comprehensive guide, we will delve into the implementation of an ANN using Python's TensorFlow library, breaking down each step in detail.

Introduction

In this tutorial, we will build an Artificial Neural Network (ANN) to predict whether a bank customer will leave the bank or not based on various features such as credit score, geography, gender, age, tenure, balance, number of products, credit card status, active membership, and estimated salary. The dataset used for this task is the "Churn_Modelling.csv" dataset.


Step 1: Importing Libraries and Dataset

First things first, we need data! We'll start by loading our dataset and preparing it for our neural network.

Importing Libraries

We'll be using some powerful libraries like Pandas, NumPy, and TensorFlow to handle our data and build our neural network.

Loading the Dataset

Our dataset contains information about bank customers, such as their credit score, age, gender, and more. We'll load this data and separate it into input features (X) and the target variable we want to predict (y).

# Importing the libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split

# Importing the dataset
dataset = pd.read_csv("Churn_Modelling.csv")
x = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1:].values

Step 2: Data Preprocessing

Data preprocessing is a crucial step in building machine learning models. In this step, we handle categorical variables, split the dataset into training and test sets, and perform feature scaling.

Encoding Categorical Data

We start by encoding categorical data using Label Encoding for the "Gender" column and One-Hot Encoding for the "Geography" column.

# Encoding categorical data
# Label Encoding the "Gender" column
lb = LabelEncoder()
x[:, 2] = lb.fit_transform(x[:, 2])

# One Hot Encoding the "Geography" column
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
x = np.array(ct.fit_transform(x))

Let's break down each line of code:

  1. lb = LabelEncoder(): This line initializes a LabelEncoder object called lb. LabelEncoder is a utility class in scikit-learn used for encoding categorical labels with integers.

  2. x[:, 2] =lb.fit_transform(x[:, 2]): This line applies the fit_transform() method of the LabelEncoder object lb to the third column (x[:, 2]) of the numpy array x. This method fits the encoder to the unique values present in the third column of x and then transforms those values into numerical labels.According to our dataset the third column is of 'Gender'

  3. ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough'): This line creates a ColumnTransformer object called ct. ColumnTransformer is used to apply transformers to columns of an array or pandas DataFrame. In this case, it specifies a transformation pipeline for the columns of the input data. Here, it consists of a single transformation named 'encoder', which is an instance of OneHotEncoder, applied to the second column ([1]) of the input data (x). The parameter remainder='passthrough' indicates that the columns not specified should be passed through without any transformation.

  4. x = np.array(ct.fit_transform(x)): This line applies the transformation specified by the ColumnTransformer object ct to the entire input data array x. It first fits the transformer to x and then transforms it. The result is assigned back to x, effectively replacing the original data with the transformed data. This transformation essentially performs one-hot encoding on the specified column (in this case, the second column or index 1) while preserving the other columns in their original form due to the remainder='passthrough' parameter.

Splitting the Dataset

Next, we split the dataset into training and test sets using the train_test_split function from scikit-learn.

# Splitting the dataset into the Training set and Test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

Explanation:

  1. train_test_split(x, y, test_size=0.2, random_state=0): This function is from the sklearn.model_selection module and is used to split arrays or matrices x and y into random train and test subsets. Here, x represents the input features, and y represents the target variable. The test_size parameter specifies the proportion of the dataset to include in the test split (in this case, 20%). The random_state parameter sets the random seed for reproducibility.

  2. x_train, x_test, y_train, y_test: These variables store the resulting train-test split. x_train and y_train will contain the training data (input features and target variable), while x_test and y_test will contain the corresponding test data.

Now, let's delve into the logic behind splitting the dataset:

  • Purpose: The primary reason for splitting the dataset into training and testing sets is to evaluate the performance of the machine learning model on unseen data. The model is trained on the training set and then evaluated on the test set to assess its generalization capability.

  • Training Set: The training set (denoted by x_train and y_train) is used to train the machine learning model. The model learns patterns and relationships in the data from this subset.

  • Test Set: The test set (denoted by x_test and y_test) is kept separate from the training set and is used to evaluate the performance of the trained model. The model's predictive accuracy is assessed using this unseen data, providing an estimate of its performance on new, unseen data in the real world.

  • Randomization: The random_state parameter ensures that the data is split randomly but consistently. Setting a specific random seed (random_state=0) ensures reproducibility, meaning that the same train-test split will be obtained each time the code is executed.

By splitting the dataset into training and testing sets, we can build and evaluate machine learning models effectively, leading to better decision-making and model performance assessment.

Feature Scaling

Feature scaling is essential for neural networks to converge faster. We standardize the features using the StandardScaler from scikit-learn.

# Feature Scaling
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.fit_transform(x_test)

Explanation:

  1. sc = StandardScaler(): In machine learning, feature scaling is a preprocessing technique used to standardize the range of independent variables or features in the data. StandardScaler is a method from the sklearn.preprocessing module that scales each feature's values to have a mean of 0 and a standard deviation of 1.

  2. sc.fit_transform(x_train): The fit_transform method of the StandardScaler object sc is called on the training set x_train. This method computes the mean and standard deviation of each feature in x_train and then scales the features based on these statistics. Essentially, it learns the parameters (mean and standard deviation) from the training data and applies the transformation simultaneously.

  3. x_train =sc.fit_transform(x_train): The scaled features are then assigned back to the variable x_train, overwriting the original training data with the scaled values.

  4. x_test =sc.fit_transform(x_test): Similarly, the test set x_test is scaled using the same scaling parameters learned from the training data. This ensures consistency in scaling between the training and test sets.

Now, let's delve into the logic behind feature scaling:

  • Purpose: Feature scaling is essential for many machine learning algorithms, particularly those based on distance metrics or gradient descent optimization. It ensures that all features contribute equally to the model fitting process by putting them on the same scale. This prevents features with larger magnitudes from dominating the learning process.

  • Standardization: StandardScaler standardizes the features by subtracting the mean and dividing by the standard deviation. This centers the data around 0 and scales it to have a standard deviation of 1. It's particularly useful when the features have varying scales or follow different distributions.

  • Consistency: It's crucial to fit the scaler only on the training data to avoid data leakage. By fitting the scaler separately on the training and test sets, we ensure that the scaling parameters are learned solely from the training data and applied consistently to both training and test sets.

By applying feature scaling, we prepare our data for modeling, ensuring that our machine learning algorithm can effectively learn from the features and make accurate predictions.


Step 3: Building the ANN

Now, we'll build the Artificial Neural Network (ANN) using TensorFlow.

Initializing the ANN

We initialize the ANN as a sequence of layers using tf.keras.models.Sequential().

# Initializing the ANN
ann = tf.keras.models.Sequential()

Adding Layers

We add layers to the ANN using the add method. Here, we add two hidden layers with ReLU activation functions and an output layer with a sigmoid activation function.

# Adding the input layer and the first hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

# Adding the second hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

# Adding the output layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

Step 4: Training the ANN

In this step, we compile and train the ANN on the training set.

Compiling the ANN

We compile the ANN using the compile method, specifying the optimizer, loss function, and metrics.

# Compiling the ANN
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Explanation:

  1. tf.keras.models.Sequential(): This line creates a new neural network model using TensorFlow's Keras API.

  2. ann: This variable holds the reference to the newly created neural network model.

Now, let's delve into the logic behind this initialization:

  • Sequential Model: The Sequential class in Keras represents a linear stack of layers. It's the simplest type of neural network model, where layers are added sequentially, one on top of the other.

  • Creating a New Model: By calling Sequential(), we instantiate a new neural network model that's initially empty. We haven't added any layers or configured any settings yet.

  • Logical Flow: In an artificial neural network, layers are the building blocks that process the input data and transform it into meaningful outputs. By initializing a sequential model, we set the stage to add layers sequentially, defining the architecture of our neural network.

  • Flexibility: TensorFlow's Keras API offers flexibility in creating neural network architectures. The sequential model is suitable for simple feedforward networks where each layer has connections only to the next layer in the sequence.

  • Customization: While Sequential is the simplest model type, Keras also allows for more complex architectures with multiple inputs, multiple outputs, shared layers, etc. For such cases, you'd use Keras's functional API instead of Sequential.

Significance of Specifying Optimizer, Loss Function, and Metrics in Model Compilation

  1. Optimizer ('adam'):

    • Significance: Adam is an efficient optimization algorithm that adapts learning rates for each parameter during training, leading to faster convergence and improved performance.

    • Optimization Process: Adam optimizes the model's weights by updating them based on gradients computed from training data, adjusting learning rates dynamically to handle different features and data patterns effectively.

  2. Loss Function ('binary_crossentropy'):

    • Role: Binary crossentropy measures the difference between predicted probabilities and actual labels for binary classification tasks. It penalizes model predictions that deviate from the true labels, guiding the model to minimize prediction errors.

    • Model Performance: Minimizing binary crossentropy during training improves the model's ability to accurately classify instances, making it a critical component for evaluating model performance.

  3. Metrics ('accuracy'):

    • Importance: Accuracy is a key metric for evaluating classification models as it quantifies the percentage of correctly predicted instances among all instances in the dataset.

    • Model Evaluation: By using accuracy as a metric, we can assess how well the model generalizes to unseen data, providing insights into its classification capabilities and overall effectiveness.

Training the ANN

We train the ANN on the training set using the fit method, specifying the batch size and number of epochs.

# Training the ANN on the Training set
ann.fit(x_train, y_train, batch_size=32, epochs=100)

Explanation:

  1. ann.fit(): This method is used to train the neural network model. It takes the input features (x_train), corresponding target labels (y_train), and other optional parameters to configure the training process.

  2. x_train: This parameter represents the input features of the training set. It contains the independent variables (or features) on which the model will be trained.

  3. y_train: This parameter represents the target labels of the training set. It contains the dependent variable (or target) that the model will learn to predict based on the input features.

  4. batch_size=32: This parameter specifies the number of samples per gradient update. In neural network training, the dataset is divided into smaller batches, and the model's weights are updated after processing each batch. Here, we've set the batch size to 32, meaning the model will update its weights after processing 32 samples.

  5. epochs=100: This parameter defines the number of epochs, which refers to the number of times the entire training dataset is passed forward and backward through the neural network. Each epoch consists of one forward pass (computing predictions) and one backward pass (updating weights). Here, we've set the number of epochs to 100, meaning the model will iterate over the entire training dataset 100 times during the training process.

Now, let's delve into the logic behind training the neural network:

  • Learning Process: Training a neural network involves optimizing its weights and biases to minimize the difference between predicted outputs and actual targets. This process is known as optimization or learning.

  • Forward and Backward Passes: During each epoch, the training data is fed forward through the network to compute predictions. Then, the error between predictions and actual targets is calculated using a loss function. This error is then propagated backward through the network using backpropagation, and the model's weights are updated to minimize the loss.

  • Batch Training: Training with batches allows for more efficient optimization, as the model's weights are updated more frequently. It also helps in dealing with memory constraints, especially for large datasets.

  • Multiple Epochs: Training for multiple epochs allows the model to learn complex patterns and relationships in the data by repeatedly adjusting its weights. However, training for too many epochs can lead to overfitting, where the model memorizes the training data instead of generalizing well to unseen data.


Step 5: Making Predictions and Evaluating the Model

Finally, we make predictions using the trained model and evaluate its performance.

Predicting Results

We use the trained model to predict whether a new customer will leave the bank or not based on their features.

# Predicting the result of a single observation
new_customer_data = [[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]]
prediction = ann.predict(sc.transform(new_customer_data))
print(prediction)

Explanation:

  1. new_customer_data: This variable represents the data of a new customer for which we want to make a prediction. It's formatted as a list of lists, where each inner list contains the features of the new customer.

  2. [[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]]: This specific value is an example of the features of a new customer. Each element in the list corresponds to a feature, such as gender, geography (one-hot encoded), credit score, age, etc. These values are provided as an example for prediction.

  3. sc.transform(new_customer_data): This line applies feature scaling to the new customer data using the StandardScaler object sc that we previously created and fitted to the training data. Feature scaling ensures that the new customer data is on the same scale as the training data, which is necessary for making accurate predictions.

  4. ann.predict(): This method is used to make predictions using the trained neural network model (ann). It takes the scaled features of the new customer data as input and returns the predicted outcome.

  5. print(prediction): This line prints the predicted outcome of the new customer data. The prediction represents the model's estimate of the outcome, which in this case is likely whether the customer will leave the bank or not.

Now, let's delve into the logic behind predicting the result of a single observation:

  • New Customer Data: We provide the features of a new customer for whom we want to make a prediction. These features are typically preprocessed and formatted in the same way as the training data to ensure compatibility with the model.

  • Feature Scaling: Before making predictions, it's crucial to scale the features of the new customer data using the same scaling parameters learned from the training data. This ensures that the new data is on the same scale as the data the model was trained on, facilitating accurate predictions.

  • Prediction: Once the new customer data is scaled, we pass it through the trained neural network model (ann) to obtain a prediction. The model processes the input features through its layers and produces an output, which in this case represents the likelihood of the customer leaving the bank.

  • Interpretation: The predicted outcome provides valuable insights for decision-making. For instance, if the predicted likelihood of a customer leaving the bank is high, the bank may take proactive measures to retain the customer, such as offering personalized incentives or improving customer service.

Evaluating Performance

We evaluate the model's performance using various metrics such as accuracy, precision, recall, and F1-score.

# Predicting the Test set results
y_pred = ann.predict(x_test)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

Let's break down the provided code:

# Predicting the Test set results
y_pred = ann.predict(x_test)
y_pred = (y_pred > 0.5)

Explanation:

  1. y_pred = ann.predict(x_test): This line uses the trained neural network model (ann) to predict the outcomes for the test set (x_test). The predict method takes the test set as input and returns the predicted outcomes, which are stored in the variable y_pred.

  2. y_pred = (y_pred > 0.5): After obtaining the predicted outcomes (y_pred), this line applies a threshold of 0.5 to convert the predicted probabilities into binary outcomes. If the predicted probability is greater than 0.5, the outcome is considered as 1 (positive outcome), indicating that the customer is likely to leave the bank. Otherwise, it's considered as 0 (negative outcome), indicating that the customer is likely to stay.

Next, let's discuss the confusion matrix:

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

Explanation:

  1. from sklearn.metrics import confusion_matrix: This line imports the confusion_matrix function from the sklearn.metrics module. The confusion matrix is a performance evaluation metric used to assess the accuracy of a classification model.

  2. cm = confusion_matrix(y_test, y_pred): This line calculates the confusion matrix based on the actual target labels (y_test) and the predicted labels (y_pred). The confusion matrix provides a tabular summary of the model's predictions, showing the counts of true positive, false positive, true negative, and false negative predictions.

  3. print(cm): Finally, this line prints the confusion matrix to the console, allowing us to visualize the model's performance in terms of correct and incorrect predictions.

Now, let's delve into the logic behind predicting the test set results and constructing the confusion matrix:

  • Prediction: The test set predictions (y_pred) are obtained using the trained neural network model (ann). These predictions represent the model's estimated outcomes for each instance in the test set.

  • Thresholding: By applying a threshold of 0.5, we convert the predicted probabilities into binary outcomes, making it easier to interpret the model's predictions. This thresholding step allows us to classify each prediction as either positive (1) or negative (0) based on the likelihood of the customer leaving the bank.

  • Confusion Matrix: The confusion matrix provides a detailed breakdown of the model's performance, showing the counts of true positive, false positive, true negative, and false negative predictions. It allows us to assess the accuracy, precision, recall, and other performance metrics of the classification model.

By analyzing the confusion matrix, we gain valuable insights into the strengths and weaknesses of our classification model, helping us make informed decisions and improve model performance if necessary.


Conclusion

In this detailed tutorial, we learned how to build an Artificial Neural Network (ANN) using Python's TensorFlow library for predicting customer churn in a bank. We covered each step of the process, from data preprocessing to model evaluation, providing insights and explanations along the way. With the knowledge gained from this tutorial, you can now apply ANNs to solve various real-world problems and continue exploring the fascinating field of deep learning.


Dear Readers, As we conclude this blog post, I want to extend my sincere gratitude to each of you. Your support, engagement, and feedback mean the world to me. Thank you for being a part of this community and for your continued enthusiasm in exploring Data science topics. I look forward to sharing more valuable content with you in the future. Here's to our ongoing journey of learning and discovery together!

Did you find this article valuable?

Support Vinay Borkar by becoming a sponsor. Any amount is appreciated!