{"id":25541,"date":"2025-12-14T07:41:08","date_gmt":"2025-12-14T07:41:08","guid":{"rendered":"https:\/\/gtracademy.org\/?p=25541"},"modified":"2025-12-18T15:56:48","modified_gmt":"2025-12-18T15:56:48","slug":"regularization-l1-l2-and-why-your-models-overfit","status":"publish","type":"post","link":"https:\/\/gtracademy.org\/staging\/regularization-l1-l2-and-why-your-models-overfit\/","title":{"rendered":"Regularization (L1\/L2) and Why Your Models Overfit"},"content":{"rendered":"<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone wp-image-25542\" src=\"https:\/\/gtracademy.org\/wp-content\/uploads\/2025\/12\/GTR_Regularization_L1L2__logo-300x173.png\" alt=\"\" width=\"725\" height=\"418\" srcset=\"https:\/\/gtracademy.org\/staging\/wp-content\/uploads\/2025\/12\/GTR_Regularization_L1L2__logo-300x173.png 300w, https:\/\/gtracademy.org\/staging\/wp-content\/uploads\/2025\/12\/GTR_Regularization_L1L2__logo.png 511w\" sizes=\"(max-width: 725px) 100vw, 725px\" \/><\/p>\n<p>Machine\u200b\u200d\u200b\u200c\u200d\u200b\u200d\u200c\u200b\u200d\u200b\u200c\u200d\u200b\u200d\u200c learning models can be excellent when tested on the data used for training, but quite often they fail to live up to expectations when real-world data are used. The reason for this gap is mostly overfitting: the model ends up learning noise and idiosyncrasies rather than general patterns. Regularization (L1\/L2) may be a rather modest in appearance, but a very effective method for keeping models under control and for improving generalization.<\/p>\n<p>What is overfitting in practice?<\/p>\n<p>Overfitting is a situation where a model is so flexible relative to the number and the quality of data that it starts to memorize training examples instead of learning robust rules. Typical indicators of overfitting are: very low training error coupled with much higher validation\/test error, predictions that are not stable, and the model being sensitive to small data changes.<\/p>\n<p>As an instance, a complex model used for predicting customer churn might perfectly follow every tiny fluctuation of the past behavior but when the customer behavior changes even a bit it can fail terribly.<\/p>\n<p>&nbsp;<\/p>\n<p>How regularization helps<\/p>\n<p>Regularization imposes a penalty on large weights in the loss function, thus it encourages simpler models which do not depend heavily on any particular feature. The concept is: \u201cIf a slightly less perfect fit on the training data leads to a more robust model on new data, then this should be preferred.\u201d<\/p>\n<p><strong>L2 regularization (Ridge)<\/strong> imposes a penalty that is proportional to the square of the weights. It usually moves weights toward zero in a smooth manner but hardly ever makes them exactly zero.<\/p>\n<p><strong>L1 regularization (Lasso)<\/strong> imposes a penalty that is proportional to the absolute value of the weights. Through that, some weights can be completely zeroed thus feature selection is implicitly done.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine\u200b\u200d\u200b\u200c\u200d\u200b\u200d\u200c\u200b\u200d\u200b\u200c\u200d\u200b\u200d\u200c learning models can be excellent when tested on the data used for training, but quite often they fail to live up to expectations when real-world data are used. The reason for this gap is mostly overfitting: the model ends up learning noise and idiosyncrasies rather than general patterns. Regularization (L1\/L2) may be a rather&#8230;<\/p>\n","protected":false},"author":11,"featured_media":25542,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"default","_kad_post_title":"default","_kad_post_layout":"default","_kad_post_sidebar_id":"","_kad_post_content_style":"default","_kad_post_vertical_padding":"default","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[792,1427],"tags":[1448,2687,2749,2748,2751,2341,2747,2750,2340],"class_list":["post-25541","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics","category-data-science","tag-artificial-intelligence-and-data-science","tag-data-analytics","tag-l1-regularization","tag-l2-regularization","tag-lasso","tag-overfitting","tag-regularization","tag-ridge","tag-underfitting"],"_links":{"self":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/posts\/25541","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/comments?post=25541"}],"version-history":[{"count":0,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/posts\/25541\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/media\/25542"}],"wp:attachment":[{"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/media?parent=25541"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/categories?post=25541"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gtracademy.org\/staging\/wp-json\/wp\/v2\/tags?post=25541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}