The Neural Network Club: Deep Learning
While we are coming to the final thoughts on this post series, it is time to talk about deep learning. Let me remind you what we have talked about before. First, we looked at artificial intelligence and then machine learning. In this post, we are going to talk about the definitions, types, and hyperparameters of deep learning.

If you have already checked that topic, you might have encountered the term neural network. Understanding its concept and working mechanism is the most important part to go through. Sorry, until now I have been talking nonsense to fill space.
Neural Network
Basically, yes, basically, a neural network aims to copy and act like the human brain’s neural connections. As with the definition of AI — enabling machines to think and act like a human — it should not surprise us. Since it’s artificial, we call it an artificial neural network (ANN), and it is a complex mathematical model that simulates brain functions. Like every algorithm, it can map any input to any output type and, of course, it requires extensive training with large datasets to learn effectively. The chart below shows its working mechanism for clear understanding. There are three different layers:

- Input Layer: The entry point for external data, labeled or unlabeled datasets for the algorithm.
- Hidden Layers: These layers process the data by refining information across multiple layers, capturing complex patterns. For example, in number recognition, these hidden layers divide each input into 28x28=784 elements. Before going into technical detail, it is sufficient to know that each element has a value between 0 and 1, and those with a value or weight of 1 produce the desired output.
- Output Layer: As you can see from the chart, this layer delivers the final processed result, which is 9.

What is Deep Learning?
Deep learning is a subset of machine learning and is called “deep” due to the usage of many layers in the neural network. I highly recommend you watch this video to better understand neural networks. We cannot ignore its advancements in the past decade, thanks to the evolution in computational power and data availability. If you want to learn more, check out my first post on the evolution of AI.
Architectures of Deep Learning
There are many different architectures, but I will only mention a few: convolutional neural networks, recurrent neural networks, long short-term memory, and generative adversarial networks.
Convolutional Neural Networks
CNN is a type of neural network designed for grid-like data structures. Initially, the architecture was designed for tasks like handwritten character recognition. Today, Convolutional Neural Networks (CNNs) are essential for image recognition and classification.

It uses convolutional layers to process input data. Convolutional layers mean that it slides the filter on the input to detect patterns on it. Imagine that we are filtering the input by a 3x3 matrix to get the input and sliding it by one-by-one on the input. I know it is complicated but I hope the above visualization helps you out.
Recurrent Neural Networks
RNNs are designed to handle sequential data such as natural language processing. It processes input data one element at a time, while maintaining its previous elements. Basically, it has connections that form direct cycles, allowing it to maintain memory of previous inputs.

Each neuron has a connection with itself and the next neuron. And that enables the network to retain information about the past inputs. To give a real world example, Google Translate uses RNNs to translate text from one language to another, however it can increase performance to some extent. RNNs face vanishing gradient problems while processing extensive sequences like longer inputs. At that point, LSTM saves our lives.
Long Short-Term Memory
Even though its architecture has a unique structure (forget gate, input gate, output gate) there is not much difference between RNNs and LSTM in the sense of purpose. It developed according to limitations of RNNs. LSTM is used widely in text generation, sentiment analysis and text translation.

How do you think the suggested texts appear in our text box? For example, we are gonna write to someone “Hello, how are you?” and the LSTM process each character of the string while memorizing the previous characters and predicting the next character or even words.
Generative Adversarial Networks
GANs are models for generating data that closely resemble original datasets. There are two main components: the generator and the discriminator. As their names suggest, the generator creates new data from random inputs, while the discriminator classifies data as real or fake.

GANs have diverse applications such as creating realistic images, enhancing virtual reality, and generating images from texts. Deep fake videos, AI character creation and photo editing are the real life applications across different fields.
Hyperparameters: The Fine-Tuning Knobs

- Batch Size: Defines the number of samples to work through before updating the model’s parameters. If the batch size is small, it can provide a more accurate estimate, while the larger batch sizes can speed up the training process.

- Number of Epochs: It refers to one complete pass through the entire training dataset. To improve its performance the model iterates over the dataset multiple times. It is necessary to find a balance because few epochs may result in underfitting while too many epochs may result with overfitting.
- Weights and Biases: These are the two main parameters of a neural network that are learned during the training phase of the model. Like the vectors, they determine the strength and direction of the connections between neurons. While weights determine the importance of the neuron connection, biases allow the activation function to be shifted left or right.

- Optimizers: They are the algorithms to update the weights and biases to minimize the value of the loss function. While choosing the optimizer, we should be very careful because it affects the convergence rate and overall performance of the model. Stochastic Gradient Descent, Adaptive Moment Estimation, RMSProp are some examples of optimizers.
- Loss Function: It measures how well the model’s prediction corresponds with the actual data. Crucial on minimizing the optimizer. Here are some common loss functions: Mean Squared Error (MSE), Cross-Entropy Loss and Binary-Cross Entropy.
Conclusion
Deep learning is a powerful subset of AI that offers unprecedented capabilities in data processing and pattern recognition. By understanding its architectures and hyperparameters, we can appreciate how these sophisticated models are trained to perform complex tasks.