{"id":688,"date":"2022-03-14T07:12:20","date_gmt":"2022-03-14T07:12:20","guid":{"rendered":"https:\/\/dice.pk\/insights\/?p=688"},"modified":"2022-03-14T07:12:20","modified_gmt":"2022-03-14T07:12:20","slug":"now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer","status":"publish","type":"post","link":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/","title":{"rendered":"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer"},"content":{"rendered":"\n<p class=\"dropcapp1 wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Tuning- or optimization as we commonly say, could be an enormously daunting task when it comes to <a href=\"https:\/\/www.oracle.com\/data-science\/machine-learning\/what-is-deep-learning\/#deep-learning-defined\" target=\"_blank\" rel=\"noreferrer noopener\">deep neural networks<\/a> (artificial neural networks with a large number of hidden layers). Technology advances within the industry of Artificial Intelligence are focused on improving the efficiency of these complex training requirements for such infinite-width ANNs.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When training your AI model, <a href=\"https:\/\/cloud.google.com\/ai-platform\/training\/docs\/using-hyperparameter-tuning\" target=\"_blank\" rel=\"noreferrer noopener\">hyperparameter tuning (hypertune)<\/a> is an optimization technique that works by specifying a variable (hyperparameter metric) of your choice that you intend to optimize, thus improving the predictive accuracy of your neural network.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Recently, Microsoft Research in collaboration with OpenAI came up with <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/%c2%b5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks\/\" target=\"_blank\" rel=\"noreferrer noopener\">a new hypertune technique<\/a> which it calls as \u03bc-Transfer, and which has increased the optimization speed of a neural network training.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Information about this new hyperparameter technique \u2018\u03bc-Transfer\u2019 has been disseminated via a <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/%c2%b5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks\/\" target=\"_blank\" rel=\"noreferrer noopener\">paper<\/a>, titled \u2018Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer\u2019 along with a Pytorch <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/%c2%b5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks\/\" target=\"_blank\" rel=\"noreferrer noopener\">written code<\/a> for those who want to use it right away.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This paper is an extension to Microsoft Research program the \u2018Tensor Programs\u2019 since its inception in 2020. A technique called \u03bc-Parameterization was then introduced for ANNs with an \u2018infinite-width limit\u2019.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Successfully as of now, the technique has been tested on a massive ANN called GPT-3 which occupies ginormous network parameters of 6.7 billion. The results of the experiment are unbelievably splendid with a fast optimization speed of just 7% of the optimization used to be applied before the \u03bc-transfer.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">A Look into Tuning of ANNs<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">If this is not your first time learning about ANNs, then you probably know that there are two kinds of settings a neural network will require, first, the selection of parameter values, and second, the selection of hyperparameter values. Both these variable-categories shall be set to some values before you start to tune your model.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <a href=\"https:\/\/cloud.google.com\/ai-platform\/training\/docs\/hyperparameter-tuning-overview\" target=\"_blank\" rel=\"noreferrer noopener\">parameters<\/a> are the nodes, also known as \u2018weights\u2019, and carry a value presenting the level of effect which it has on the \u201cfinal prediction\u201d.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/cloud.google.com\/ai-platform\/training\/docs\/hyperparameter-tuning-overview\" target=\"_blank\" rel=\"noreferrer noopener\">Hyperparameter<\/a> present the depth and width of your neural network, that is, depth as the total number of hidden layers between the input and output layer, and width as the total number of nodes within a single layer.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized is-style-default\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/dYiy8Ncs1WAVZvI6RqKgvfHyp7doKyIIMV40osBoP75Z4umUlts33kgzVJ7s3NSiC0xZggVw3tgp-wPmIgytNtkGUpRh7kx3j_gJ9zpAnsP4YpdC_J0Mftj-2SOSgIdLu4CX66Dz\" alt=\"Layers in Artificial Neural Network\" width=\"697\" height=\"418\" \/><figcaption>Image by <a href=\"https:\/\/pixabay.com\/users\/ahmedgad-9403351\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=3637503\" target=\"_blank\" rel=\"noreferrer noopener\">Ahmed Gad <\/a>from<a href=\"https:\/\/pixabay.com\/users\/ahmedgad-9403351\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=3637503\" target=\"_blank\" rel=\"noreferrer noopener\"> Pixabay<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Computational-Expensive Hypertuning<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Since <a href=\"https:\/\/blogs.oracle.com\/bigdata\/post\/neural-networks-in-deep-learning\" target=\"_blank\" rel=\"noreferrer noopener\">ANNs by definition are self-learning algorithms<\/a>, and require no human-assistance in building a data model, as previously done for <a href=\"https:\/\/www.oracle.com\/pk\/data-science\/machine-learning\/what-is-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">conventional machine learning algorithms<\/a>, these artificial networks use trial-and-error technique to select optimal values for themselves.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Given the <a href=\"https:\/\/cloud.google.com\/ai-platform\/training\/docs\/hyperparameter-tuning-overview\" target=\"_blank\" rel=\"noreferrer noopener\">trial-and-error technique<\/a>, a user sets a range of values for hyperparameters. Based on this range, a trial is initiated and subsequent trials are run using guessed values of hyperparameters (based on results from previous trials). In the end, that hyperparameter value is picked that shows the best result. That\u2019s how ANN tunes its hyperparameters!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Unfortunately, the trial-and-error technique is very slow especially when the width of ANN is large (carrying millions and billions of parameters). Industry experts are continuously putting efforts in creating advanced techniques to enable speedy and accurate feature learning in an \u2018infinite-width limit\u2019.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The \u2018\u03bc-Transfer\u2019 offers the ability to reduce this trial-and-error cost by speeding up the rigorous process of training which is especially important for huge networks like GPT-3.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Colin Raffel, co-creator of the T5 and assistant professor at Computer Science, University of North Carolina, shared his thoughts on the breakthrough:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;\u201c\u00b5P provides an impressive step toward removing some of the black magic from scaling up neural networks. It also provides a theoretically backed explanation of some tricks used by past works, like the T5 model. I believe both practitioners and researchers alike will find this work valuable,\u201d&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer<br><br>By transferring from 40M parameters, \u00b5Transfer outperforms the 6.7B GPT-3, with tuning cost only 7% of total pretraining cost.<br><br>abs: <a href=\"https:\/\/t.co\/kYiuGDiUpE\">https:\/\/t.co\/kYiuGDiUpE<\/a><br>repo: <a href=\"https:\/\/t.co\/TG4eZHErto\">https:\/\/t.co\/TG4eZHErto<\/a> <a href=\"https:\/\/t.co\/D8xbHxYRLS\">pic.twitter.com\/D8xbHxYRLS<\/a><\/p>&mdash; Aran Komatsuzaki (@arankomatsuzaki) <a href=\"https:\/\/twitter.com\/arankomatsuzaki\/status\/1501020970420785156?ref_src=twsrc%5Etfw\">March 8, 2022<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn the basics of hyperparameter tuning and tune your neural networks faster with Microsoft&#8217;s new technique, the \u03bc-Transfer<\/p>\n","protected":false},"author":7,"featured_media":829,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,17,1],"tags":[23,69,72],"class_list":["post-688","post","type-post","status-publish","format-standard","has-post-thumbnail","category-ai","category-machine-learning","category-uncategorized","tag-ai","tag-machine-learning","tag-neural-networks"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.14 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer - Dicecamp Insights<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-\u03bc-transfer\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer - Dicecamp Insights\" \/>\n<meta property=\"og:description\" content=\"Learn the basics of hyperparameter tuning and tune your neural networks faster with Microsoft&#039;s new technique, the \u03bc-Transfer\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-\u03bc-transfer\/\" \/>\n<meta property=\"og:site_name\" content=\"Dicecamp Insights\" \/>\n<meta property=\"article:published_time\" content=\"2022-03-14T07:12:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dicecamp.com\/insights\/wp-content\/uploads\/2022\/03\/ms.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"644\" \/>\n\t<meta property=\"og:image:height\" content=\"404\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Ayesha\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ayesha\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/\",\"url\":\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/\",\"name\":\"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer - Dicecamp Insights\",\"isPartOf\":{\"@id\":\"https:\/\/dicecamp.com\/insights\/#website\"},\"datePublished\":\"2022-03-14T07:12:20+00:00\",\"dateModified\":\"2022-03-14T07:12:20+00:00\",\"author\":{\"@id\":\"https:\/\/dicecamp.com\/insights\/#\/schema\/person\/1b7d4bef40ac58bbedfa718df21e2463\"},\"breadcrumb\":{\"@id\":\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/dicecamp.com\/insights\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/dicecamp.com\/insights\/#website\",\"url\":\"https:\/\/dicecamp.com\/insights\/\",\"name\":\"Dicecamp Insights\",\"description\":\"All Things Tech!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/dicecamp.com\/insights\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/dicecamp.com\/insights\/#\/schema\/person\/1b7d4bef40ac58bbedfa718df21e2463\",\"name\":\"Ayesha\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/dicecamp.com\/insights\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fc0617698baa4b6b794771cffa4c63de5ee5febb87eef29e53208d83b8be582e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fc0617698baa4b6b794771cffa4c63de5ee5febb87eef29e53208d83b8be582e?s=96&d=mm&r=g\",\"caption\":\"Ayesha\"},\"description\":\"I engineer the content and acquaint the science of analytics to empower rookies and professionals.\",\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/ayesha-saeed-13as96\/\"],\"url\":\"https:\/\/dicecamp.com\/insights\/author\/ayesha\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer - Dicecamp Insights","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-\u03bc-transfer\/","og_locale":"en_US","og_type":"article","og_title":"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer - Dicecamp Insights","og_description":"Learn the basics of hyperparameter tuning and tune your neural networks faster with Microsoft's new technique, the \u03bc-Transfer","og_url":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-\u03bc-transfer\/","og_site_name":"Dicecamp Insights","article_published_time":"2022-03-14T07:12:20+00:00","og_image":[{"width":644,"height":404,"url":"https:\/\/dicecamp.com\/insights\/wp-content\/uploads\/2022\/03\/ms.jpg","type":"image\/jpeg"}],"author":"Ayesha","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Ayesha","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/","url":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/","name":"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer - Dicecamp Insights","isPartOf":{"@id":"https:\/\/dicecamp.com\/insights\/#website"},"datePublished":"2022-03-14T07:12:20+00:00","dateModified":"2022-03-14T07:12:20+00:00","author":{"@id":"https:\/\/dicecamp.com\/insights\/#\/schema\/person\/1b7d4bef40ac58bbedfa718df21e2463"},"breadcrumb":{"@id":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/dicecamp.com\/insights\/now-tune-your-ann-faster-microsoft-releases-an-efficient-hypertuning-technique-the-%ce%bc-transfer\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dicecamp.com\/insights\/"},{"@type":"ListItem","position":2,"name":"Now Tune Your ANN Faster: Microsoft Releases an Efficient Hypertuning Technique, the \u03bc-Transfer"}]},{"@type":"WebSite","@id":"https:\/\/dicecamp.com\/insights\/#website","url":"https:\/\/dicecamp.com\/insights\/","name":"Dicecamp Insights","description":"All Things Tech!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dicecamp.com\/insights\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/dicecamp.com\/insights\/#\/schema\/person\/1b7d4bef40ac58bbedfa718df21e2463","name":"Ayesha","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/dicecamp.com\/insights\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fc0617698baa4b6b794771cffa4c63de5ee5febb87eef29e53208d83b8be582e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fc0617698baa4b6b794771cffa4c63de5ee5febb87eef29e53208d83b8be582e?s=96&d=mm&r=g","caption":"Ayesha"},"description":"I engineer the content and acquaint the science of analytics to empower rookies and professionals.","sameAs":["https:\/\/www.linkedin.com\/in\/ayesha-saeed-13as96\/"],"url":"https:\/\/dicecamp.com\/insights\/author\/ayesha\/"}]}},"_links":{"self":[{"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/posts\/688","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/comments?post=688"}],"version-history":[{"count":0,"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/posts\/688\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/media\/829"}],"wp:attachment":[{"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/media?parent=688"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/categories?post=688"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dicecamp.com\/insights\/wp-json\/wp\/v2\/tags?post=688"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}