{"id":25074,"date":"2025-12-08T09:59:51","date_gmt":"2025-12-08T09:59:51","guid":{"rendered":"https:\/\/gtracademy.org\/?p=25074"},"modified":"2025-12-08T09:59:51","modified_gmt":"2025-12-08T09:59:51","slug":"gradient-descent-a-beginner-friendly-guide","status":"publish","type":"post","link":"https:\/\/gtracademy.org\/staging\/gradient-descent-a-beginner-friendly-guide\/","title":{"rendered":"Gradient Descent A Beginner-Friendly Guide to How Models Learn\u00a0Best for 2025"},"content":{"rendered":"<p><span data-contrast=\"auto\">Most modern <a href=\"https:\/\/gtracademy.org\/data-science-ai-course-online-with-ml-dl-nlp\/\"><strong>ML models<\/strong><\/a> from simple regressions to deep neural networks learn using the core idea i.e. Gradient Descent. It\u2019s an optimization method that adjusts model parameters gradually to minimize the error, very similar to like walking downhill till you reach the lowest point of a hill.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><strong><span style=\"font-size: 18pt;\">Connect With Us:<a href=\"https:\/\/api.whatsapp.com\/send\/?phone=919650518049&amp;text=Hi%2C%20I%20want%20to%20know%20more%20about%20GTR%20academy%20courses\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #339966;\"> WhatsApp<\/span><\/a><\/span><\/strong><\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone size-full wp-image-25087\" src=\"https:\/\/gtracademy.org\/wp-content\/uploads\/2025\/12\/cc8c1b8c-f11e-4a4a-88b2-3a5352a94f7e.webp\" alt=\"Gradient Descent A Beginner-Friendly Guide\" width=\"800\" height=\"450\" srcset=\"https:\/\/gtracademy.org\/staging\/wp-content\/uploads\/2025\/12\/cc8c1b8c-f11e-4a4a-88b2-3a5352a94f7e.webp 800w, https:\/\/gtracademy.org\/staging\/wp-content\/uploads\/2025\/12\/cc8c1b8c-f11e-4a4a-88b2-3a5352a94f7e-300x169.webp 300w, https:\/\/gtracademy.org\/staging\/wp-content\/uploads\/2025\/12\/cc8c1b8c-f11e-4a4a-88b2-3a5352a94f7e-768x432.webp 768w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<h2><b><span data-contrast=\"auto\">Rolling ball down the hill \u2013 An analogy<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Imagine this you are standing on a hazy mountain and want to reach the bottom of the valley. You are only able to see the ground right around your feet. You can feel which direction slopes downward and take a small step that way, then you keep repeating this process again. This is what Gradient Descent exactly does: at every step, it calculates at the \u201cslope\u201d of the loss function and then moves the parameters of the model in the direction that minimizes the loss.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In this example:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"8\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">Landscape on the example = The loss function (or the error of the model).<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"8\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"auto\">Your position above = the current model parameters (called weights).<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"8\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"auto\">Slope under your feet = Gradient.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"8\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1080,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"4\" data-aria-level=\"1\"><span data-contrast=\"auto\">Each step downhill = Gradient descent update.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">Repeating these small steps again and again, you reach closer to minimizing the loss, leading to improving the model\u2019s predictions.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Core update rule (without heavy math)<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The goal of the gradient descent is to minimize the Loss function\u202f<\/span><span data-mathml=\"&lt;math xmlns=&quot;http:\/\/www.w3.org\/1998\/Math\/MathML&quot; display=&quot;block&quot;&gt;&lt;mi&gt;J&lt;\/mi&gt;&lt;mo fence=&quot;false&quot;&gt;(&lt;\/mo&gt;&lt;mi&gt;&amp;#x1D703;&lt;\/mi&gt;&lt;mo fence=&quot;false&quot;&gt;)&lt;\/mo&gt;&lt;\/math&gt;\">J(\u03b8)J(\ud835\udf03)<\/span><span data-contrast=\"auto\">, were\u202f<\/span><span data-mathml=\"&lt;math xmlns=&quot;http:\/\/www.w3.org\/1998\/Math\/MathML&quot; display=&quot;block&quot;&gt;&lt;mi&gt;&amp;#x1D703;&lt;\/mi&gt;&lt;\/math&gt;\">\u03b8\ud835\udf03<\/span><span data-contrast=\"auto\">\u202findicates the model parameters. Gradient descent updates parameters using:<\/span><span data-ccp-props=\"{}\"> \u00a0<\/span><span data-mathml=\"&lt;math xmlns=&quot;http:\/\/www.w3.org\/1998\/Math\/MathML&quot; display=&quot;block&quot;&gt;&lt;msub&gt;&lt;mi&gt;&amp;#x1D73D;&lt;\/mi&gt;&lt;mrow&gt;&lt;mtext&gt;ne&lt;\/mtext&gt;&lt;mtext&gt;w&lt;\/mtext&gt;&lt;\/mrow&gt;&lt;\/msub&gt;&lt;mo&gt;=&lt;\/mo&gt;&lt;msub&gt;&lt;mi&gt;&amp;#x1D73D;&lt;\/mi&gt;&lt;mrow&gt;&lt;mtext&gt;ol&lt;\/mtext&gt;&lt;mtext&gt;d&lt;\/mtext&gt;&lt;\/mrow&gt;&lt;\/msub&gt;&lt;mo&gt;&amp;#x2212;&lt;\/mo&gt;&lt;mi&gt;&amp;#x1D73C;&lt;\/mi&gt;&lt;mo&gt;&amp;#x22C5;&lt;\/mo&gt;&lt;mi&gt;&amp;#x1D6C1;&lt;\/mi&gt;&lt;mi&gt;&amp;#x1D471;&lt;\/mi&gt;&lt;mo fence=&quot;false&quot;&gt;(&lt;\/mo&gt;&lt;msub&gt;&lt;mi&gt;&amp;#x1D73D;&lt;\/mi&gt;&lt;mrow&gt;&lt;mtext&gt;ol&lt;\/mtext&gt;&lt;mtext&gt;d&lt;\/mtext&gt;&lt;\/mrow&gt;&lt;\/msub&gt;&lt;mo fence=&quot;false&quot;&gt;)&lt;\/mo&gt;&lt;\/math&gt;\">\u03b8new=\u03b8old\u2212\u03b7\u22c5\u2207J(\u03b8old)\ud835\udf3dnew=\ud835\udf3dold\u2212\ud835\udf3c\u22c5\ud835\udec1\ud835\udc71(\ud835\udf3dold) \u2207<\/span><span data-mathml=\"&lt;math xmlns=&quot;http:\/\/www.w3.org\/1998\/Math\/MathML&quot; display=&quot;block&quot;&gt;&lt;mi&gt;&amp;#x1D6C1;&lt;\/mi&gt;&lt;mi&gt;&amp;#x1D471;&lt;\/mi&gt;&lt;mo fence=&quot;false&quot;&gt;(&lt;\/mo&gt;&lt;mi&gt;&amp;#x1D73D;&lt;\/mi&gt;&lt;mo fence=&quot;false&quot;&gt;)&lt;\/mo&gt;&lt;\/math&gt;\">J(\u03b8)\ud835\udec1\ud835\udc71(\ud835\udf3d)<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">\u202fis the gradient i.e. the direction of steepest increase of the loss.<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">\u202f(eta)<\/span><\/b><span data-contrast=\"auto\"> is the learning rate, which controls how big each step is.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">Because the gradient points to uphill, subtracting it will move us downhill thus reducing the loss over time.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Different types of gradient descent<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In practice, there are commonly three types of Gradient Descent variants which differ in the way they use data per step.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Batch Gradient Descent<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"2\"><span data-contrast=\"auto\">Uses the complete training dataset and compute gradient for each update.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"2\"><b><span data-contrast=\"auto\">Pros:<\/span><\/b><span data-contrast=\"auto\"> Smooth convergence, stable path.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"2\"><b><span data-contrast=\"auto\">Cons:<\/span><\/b><span data-contrast=\"auto\"> Slow and memory-heavy for large datasets.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">Stochastic Gradient Descent (SGD)<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"2\"><span data-contrast=\"auto\">Uses one training example at a time to update parameters.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"2\"><b><span data-contrast=\"auto\">Pros:<\/span><\/b><span data-contrast=\"auto\"> Updates very fast and can escape shallow local minima.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"2\"><b><span data-contrast=\"auto\">Cons:<\/span><\/b><span data-contrast=\"auto\"> Path is noisy and loss curve zigzags instead of smoothly decreasing.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">Mini-batch Gradient Descent<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"2\"><span data-contrast=\"auto\">Use small batches (e.g., 24, 32, 64, 128 data points) for each update.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"2\"><b><span data-contrast=\"auto\">Pros:<\/span><\/b><span data-contrast=\"auto\"> This is a balanced approach of speed and stability and usually it is a default choice in deep learning.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">Choosing the correct variant depends on size of the dataset and hardware, but mini-batch most commonly used in the real-world training.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">The learning rate (LR)<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The learning rate\u202f<\/span><span data-mathml=\"&lt;math xmlns=&quot;http:\/\/www.w3.org\/1998\/Math\/MathML&quot; display=&quot;block&quot;&gt;&lt;mi&gt;&amp;#x1D73C;&lt;\/mi&gt;&lt;\/math&gt;\">\u03b7\ud835\udf3c<\/span><span data-contrast=\"auto\">\u202fis one of the important hyper-parameters in gradient descent.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"4\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Too high LR:<\/span><\/b><span data-contrast=\"auto\"> Means the algorithm takes huge steps, overshoots the minimum, and may diverge (loss explodes instead of decreasing).<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"4\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Too low LR:<\/span><\/b><span data-contrast=\"auto\"> Means the algorithm is taking tiny steps, converging very slowly, and it may get stuck in flatter regions.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">A more realistic approach is to start with a moderate learning rate (e.g., 0.01 or 0.001) and keep monitoring the loss curve over various training epochs. If there is a jump in the loss or it increases, significantly, we need to reduce the learning rate; if it decreases very slowly, we should consider increasing the learning rate slightly or use the learning rate schedule.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Adaptive optimizers (some examples are Adam, RMSprop, and Adara) automatically adjust the effective learning rates for the parameter, improving the convergence in the deep networks.<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Simple real-world examples<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li><span data-contrast=\"auto\">Few scenarios which make gradient descent tangible even more:<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">Linear regression for house prices<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li><span data-contrast=\"auto\">Inputs: Size, number of rooms, location features.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Output: Price.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Gradient descent adjusts the weights on each of the features so that the predicted prices get closer to actual prices, minimizing mean squared error (MSE).<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">Logistic regression for spam detection<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li><span data-contrast=\"auto\">Inputs: Email features (e.g., word frequencies, presence of certain terms).<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Output: Spam or not spam.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Gradient descent optimizes the parameters so that the model correctly classifies emails and minimize the classification loss like cross-entropy.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">Convoluted Neural network for image classification<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li><span data-contrast=\"auto\">Inputs: Pixel values.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Output: Class label (e.g., cat vs dog).<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Backpropagation computes the gradients layer by layer, and the gradient descent updates millions of weights reducing the overall loss (error).<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Even though the models might differ, but the optimization engine underneath is still gradient descent (or a variant of it).<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<h3><strong><span style=\"font-size: 18pt;\">Connect With Us:<a href=\"https:\/\/api.whatsapp.com\/send\/?phone=919650518049&amp;text=Hi%2C%20I%20want%20to%20know%20more%20about%20GTR%20academy%20courses\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #339966;\"> WhatsApp<\/span><\/a><\/span><\/strong><\/h3>\n<h3><b><span data-contrast=\"auto\">What are the common pitfalls for gradient descent and how you can avoid them<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">When starting with gradient descent, watch out for these issues:<\/span><span data-contrast=\"auto\">\u200b<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h5><span data-contrast=\"auto\">Poor scaling of features<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h5>\n<ul>\n<li><span data-contrast=\"auto\">When features are measured on very different scales (e.g., age vs. income), the loss surface can become skewed slowing the convergence.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Fix: Applying the normalization or standardization to features.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<h5><span data-contrast=\"auto\">Getting stuck or oscillating<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h5>\n<ul>\n<li><span data-contrast=\"auto\">The loss does not decrease smoothly or gets stuck in plateaus.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Fix: Try a different learning rate, mini-batches, momentum, or an adaptive optimizer like Adam.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<h5><span data-contrast=\"auto\">Overfitting<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/h5>\n<ul>\n<li><span data-contrast=\"auto\">Even with good optimization, the model may overfit training data.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Fix: Use regularization (L2, dropout), early stopping, or more data; gradient descent will then optimize a regularized loss that better generalizes.<\/span><span data-contrast=\"auto\">\u200b<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Most modern ML models from simple regressions to deep neural networks learn using the core idea i.e. Gradient Descent. It\u2019s an optimization method that adjusts model parameters gradually to minimize the error, very similar to like walking downhill till you reach the lowest point of a hill.\u200b\u00a0 Connect With Us: WhatsApp Rolling ball down the&#8230;<\/p>\n","protected":false},"author":5,"featured_media":25087,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"default","_kad_post_title":"default","_kad_post_layout":"default","_kad_post_sidebar_id":"","_kad_post_content_style":"default","_kad_post_vertical_padding":"default","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[1427],"tags":[2519,2520,2521,2522],"class_list":["post-25074","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","tag-gradient-descent","tag-gradient-descent-tutorial","tag-machine-learning-basics","tag-optimization-algorithm"],"_links":{"self":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/posts\/25074","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/comments?post=25074"}],"version-history":[{"count":0,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/posts\/25074\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/media\/25087"}],"wp:attachment":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/media?parent=25074"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/categories?post=25074"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/tags?post=25074"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}