<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>NLP &#8211; Megaputer Intelligence</title>
	<atom:link href="https://www.megaputer.com/tag/nlp/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.megaputer.com</link>
	<description>Your Knowledge Partner</description>
	<lastBuildDate>Tue, 24 Mar 2026 00:02:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.0.22</generator>

<image>
	<url>https://www.megaputer.com/wp-content/uploads/favicon.png</url>
	<title>NLP &#8211; Megaputer Intelligence</title>
	<link>https://www.megaputer.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>What is Linguistic Corpus?</title>
		<link>https://www.megaputer.com/what-is-linguistic-corpus/</link>
		<pubDate>Tue, 05 Dec 2023 20:55:10 +0000</pubDate>
		<dc:creator><![CDATA[Echo Lu]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Computational Linguistics]]></category>
		<category><![CDATA[NLP]]></category>

		<guid isPermaLink="false">https://www.megaputer.com/?p=35025</guid>
		<description><![CDATA[<p>The post <a rel="nofollow" href="https://www.megaputer.com/what-is-linguistic-corpus/">What is Linguistic Corpus?</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></description>
				<content:encoded><![CDATA[<section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p><span style="font-weight: 400;">At the core of Natural Language Processing lies extensive language data. The availability of copious amounts of text or speech, faithfully representing natural language, is pivotal for the efficacy of artificial language systems. Diverse forms of text, from Wikipedia articles to Twitter threads, find application in various language models. These substantial language datasets collectively constitute what is known as a corpus in linguistics—an extensive and principled collection of authentic, naturally occurring language.</span></p>

		</div>
	</div>
<div class="w-separator size_small"></div><div class="w-image align_center"><div class="w-image-h"><img width="564" height="374" src="https://www.megaputer.com/wp-content/uploads/photo_2023-11-20_12-03-27.jpg" class="attachment-full size-full" alt="Random collection of different words and word-forms on magnetic tiles" srcset="https://www.megaputer.com/wp-content/uploads/photo_2023-11-20_12-03-27.jpg 564w, https://www.megaputer.com/wp-content/uploads/photo_2023-11-20_12-03-27-300x199.jpg 300w" sizes="(max-width: 564px) 100vw, 564px" /></div></div><div class="w-separator size_medium"></div>
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p><span style="font-weight: 400;">Despite gaining increased attention with the advent of LLMs, the concept of a corpus is not new in linguistics, dating back to the early 20th century or earlier. During that time, lexicographers manually compiled substantial language samples for dictionary construction, representing a rudimentary form of a corpus. Beyond lexicographers and grammarians, linguists manually compiled corpora for various purposes, such as language pedagogy or the study of language acquisition in children. Moreover, the so-called &#8220;structural&#8221; linguists, who dominated the field until the 1950s, emphasized the utility of corpora, believing in the analytical approach to the distributional characteristics of language. With this idea, linguist Charles C. Fries, for example, manually gathered telephone conversations from approximately 300 speakers, encompassing 250,000 words. Field linguist Franz Boas dedicated himself to establishing a substantial corpus of texts from Native Americans.</span></p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p><span style="font-weight: 400;">In the 1960s, however, the first electronic corpus appeared. W. Nelson Francis and Henry Kučera, based at Brown University, crafted a corpus of 1 million American English words, famously known as the Brown corpus. Thereafter, linguistic corpora rapidly evolved and diversified in terms of size and scope, driven by advancements in computer hardware. For instance, </span><a href="https://www.english-corpora.org/coca/"><span style="font-weight: 400;">the Corpus of Contemporary American English (COCA)</span></a><span style="font-weight: 400;">, currently </span><span style="font-weight: 400;">one of the most widely used American English corpora, encompasses over one billion words sourced from eight different genres.</span></p>

		</div>
	</div>
<div class="w-separator size_medium"></div><div class="w-image align_center"><div class="w-image-h"><img width="600" height="451" src="https://www.megaputer.com/wp-content/uploads/istock-1439146187-600x451.jpg" class="attachment-shop_single size-shop_single" alt="Abstract source code background, Big data database app, Computer code, Lines of HTML code by programmer, active online data transformation and exchange" srcset="https://www.megaputer.com/wp-content/uploads/istock-1439146187-600x451.jpg 600w, https://www.megaputer.com/wp-content/uploads/istock-1439146187-300x225.jpg 300w, https://www.megaputer.com/wp-content/uploads/istock-1439146187-1024x769.jpg 1024w, https://www.megaputer.com/wp-content/uploads/istock-1439146187-768x577.jpg 768w, https://www.megaputer.com/wp-content/uploads/istock-1439146187-533x400.jpg 533w" sizes="(max-width: 600px) 100vw, 600px" /></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2><b>Types of Corpora and their use</b></h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p><span style="font-weight: 400;">Corpora can be broadly categorized into two types: generalized and specialized. A generalized corpus aims to represent all aspects of a language and typically consists of a large number of words and/or documents. Examples of generalized corpora include the </span><a href="http://www.natcorp.ox.ac.uk/"><span style="font-weight: 400;">British National Corpus</span></a><span style="font-weight: 400;"> and </span><a href="https://www.english-corpora.org/coca/"><span style="font-weight: 400;">COCA</span></a><span style="font-weight: 400;">, mentioned earlier. On the other hand, a specialized corpus is constructed to represent specific aspects of a language, such as language produced by particular user groups (e.g., children, teenagers, second language learners, aphasics) or in specific settings (e.g., university, business). The </span><a href="https://childes.talkbank.org/access/"><span style="font-weight: 400;">Child Language Data Exchange System (CHILDES)</span></a><span style="font-weight: 400;">, a large corpus of language produced by developing children, and the </span><a href="https://scnweb.japanknowledge.com/register/PERC/index.html"><span style="font-weight: 400;">Professional English Research Consortium (PERC)</span></a><span style="font-weight: 400;"> corpus, which contains English academic journal texts in science, engineering, technology, and other fields, are examples of specialized corpora.</span></p>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner "><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p><img style="float: right;" src="https://www.megaputer.com/wp-content/uploads/frame-2.png" alt="Parts of Speech" width="500" /><span style="font-weight: 400;">Corpora are frequently enriched with annotations containing linguistic information. Typically, texts within corpora undergo Part of Speech (POS) tagging, wherein the grammatical category of each word is identified. Beyond POS tagging, corpora can be further enhanced with annotations covering syntactic structure, semantic roles (denoting the roles played by each noun argument in a sentence, such as agent, patient, instrument, etc.), Named Entity Recognition (NER), sentiment analysis, and more.</span>To illustrate, the <a href="https://catalog.ldc.upenn.edu/LDC99T42"><span style="font-weight: 400;">Penn Treebank</span></a><span style="font-weight: 400;"> project annotates a collection of Wall Street Journal articles with both syntactic dependency information and POS tagging. Another type of syntactic corpus might be the </span><a href="http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html"><span style="font-weight: 400;">Google Books Syntactic N-grams dataset</span></a><span style="font-weight: 400;">, comprising syntactic tree fragments automatically extracted from the Google English Books collection. Some corpora are annotated with semantic information. The </span><a href="https://propbank.github.io/"><span style="font-weight: 400;">Proposition Bank</span></a><span style="font-weight: 400;">, for example, provides a collection of English sentences that are manually annotated with various semantic roles. Some corpora also feature Named Entity Recognition. For example, </span><a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S3"><span style="font-weight: 400;">GENETAG</span></a><span style="font-weight: 400;"> is a corpus of 20,000 sentences from the MEDLINE database annotated with gene/protein NER.</span></p>

		</div>
	</div>
</div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2><b>Use of Corpora in NLP</b></h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p><span style="font-weight: 400;">Corpora have traditionally served as tools to investigate specific aspects of language use, including collocation, frequency, preferred constructions, patterns of errors, and more. With the emergence of computer technology and machine learning algorithms, they have evolved into essential resources for model training and validation. Large Language Models are now trained on vast amounts of language data. For instance, OpenAI&#8217;s GPT-3 175B model is trained with 300 billion tokens gathered from sources such as Common Crawl, WebText2, and Wikipedia. Meta&#8217;s LLaMa 65B model, on the other hand, is trained with 1.4 trillion tokens sourced from various platforms, including English CommonCrawl, C4, Github, Wikipedia, ArXiv, and Stack Exchange. In our PolyAnalyst, diverse corpora like </span><a href="https://anc.org/"><span style="font-weight: 400;">Open American National Corpus</span></a><span style="font-weight: 400;">, the </span><a href="https://catalog.ldc.upenn.edu/LDC99T42"><span style="font-weight: 400;">Penn Treebank</span></a><span style="font-weight: 400;">, CoNLL Corpus, </span><a href="https://opus.nlpl.eu/Europarl-v3.php"><span style="font-weight: 400;">Europarl</span></a><span style="font-weight: 400;">, the </span><a href="https://ruscorpora.ru/en/"><span style="font-weight: 400;">Russian National Corpus</span></a><span style="font-weight: 400;">, and Wikipedia are employed for various modules. The utilization of large language datasets is poised to expand further in the future, making the adept handling of such data a crucial skill in the field of Natural Language Processing.</span></p>

		</div>
	</div>
</div></div></div></div></div></section>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/what-is-linguistic-corpus/">What is Linguistic Corpus?</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></content:encoded>
			</item>
		<item>
		<title>Mastering Language Models: A Deep Dive into Input Parameters</title>
		<link>https://www.megaputer.com/mastering-language-models-a-deep-dive-into-input-parameters/</link>
		<pubDate>Thu, 28 Sep 2023 21:57:45 +0000</pubDate>
		<dc:creator><![CDATA[Echo Lu]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Text Analytics]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Large Language Models]]></category>
		<category><![CDATA[NLP]]></category>

		<guid isPermaLink="false">https://www.megaputer.com/?p=34673</guid>
		<description><![CDATA[<p>Helping data analysts create effective dashboards is a key task for User Experience designers.  Even for experienced UX designers, a great amount of effort is spent on distilling large amounts of complex information into a simple, clear storytelling report to display to the client. As PolyAnalyst software UX designers, we work hard to incorporate web reporting features that help users create meaningful and interactive dashboards. Recently, we received some general requests from our users asking us for some tips on how to beautify their reports and make them more effective. So in response, we decided to share a few tips on creating better dashboards and web reports, including some of the DO’s and DON’Ts of dashboard design. </p>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/mastering-language-models-a-deep-dive-into-input-parameters/">Mastering Language Models: A Deep Dive into Input Parameters</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></description>
				<content:encoded><![CDATA[<section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>If you have ever used a language model through a playground or an API, you may have been asked to choose some input parameters. For many of us, the meaning of these parameters (and the right way to use them) is less than totally clear.</p>

		</div>
	</div>
<div class="w-separator size_small"></div><div class="w-image align_center"><div class="w-image-h"><img width="400" height="240" src="https://www.megaputer.com/wp-content/uploads/parameter-selection_in_the_sillytavern_interface.png" class="attachment-large size-large" alt="A screenshot showing parameter selection in the SillyTavern interface. Image by the author." srcset="https://www.megaputer.com/wp-content/uploads/parameter-selection_in_the_sillytavern_interface.png 400w, https://www.megaputer.com/wp-content/uploads/parameter-selection_in_the_sillytavern_interface-300x180.png 300w" sizes="(max-width: 400px) 100vw, 400px" /></div></div><div class="w-separator size_medium"></div>
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>This article will teach you how to use these parameters to control hallucinations, inject creativity into your model’s outputs, and make other fine-grained adjustments to optimize behavior. Much like prompt engineering, input parameter tuning can get your model running at 110%.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>By the end of this article, you’ll be an expert on five essential input parameters — temperature, top-p, top-k, frequency penalty, and presence penalty. You’ll also learn how each of these parameters helps us navigate the quality-diversity tradeoff.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>So, grab a coffee, and let’s get started!</p>

		</div>
	</div>
</div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<div id="table-of-contents">
<h3>Table of Contents</h3>
<ul>
<li><a href="#section1">Background</a></li>
<li><a href="#section2">Quality, Diversity, and Temperature</a></li>
<li><a href="#section3">Top-k and Top-p</a></li>
<li><a href="#section4">Frequency and Presence Penalties</a></li>
<li><a href="#section5">The Parameter-Tuning Cheat Sheet</a></li>
<li><a href="#section6">Wrapping up</a></li>
</ul>
<p><!-- Add more entries as needed --></p>
</div>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner " id="section1"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper"></div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2 id="5876" class="nb nc fo be nd ne nf go ng nh ni gr nj nk nl nm nn no np nq nr ns nt nu nv nw bj">Background</h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>Before we start, we will need to go over some background information about how these models work. Let’s start our deep dive by reviewing the fundamentals.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="59cd" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">To read a document, a language model breaks it down into a sequence of tokens. A token is just a small chunk of text that the model can easily understand: It could be a word, a syllable, or a character. For example, “Megaputer is great!” could be broken down into five tokens: [“Mega”, “puter ”, “is ”, “ great”, “!”]. This is done by the tokenizer.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="59cd" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Most language models we are familiar with operate by repeatedly generating the next token in a sequence. Each time the model wants to generate another token, it re-reads the entire sequence and then predicts the token that should come next. This strategy is known as <em class="oc">autoregressive </em>generation.</p>

		</div>
	</div>
<div class="w-image align_center meta_simple"><a class="w-image-h" href="https://developer.nvidia.com/blog/how-to-get-better-outputs-from-your-large-language-model/" title="How to Get Better Outputs from Your Large Language Model" target="_blank"><img width="320" height="320" src="https://www.megaputer.com/wp-content/uploads/autoregressive_generation.gif" class="attachment-large size-large" alt="GIF by Echo Lu, containing a modification of an image by Annie Surla from NVIDIA. Modified with permission from owner." /><div class="w-image-meta"><div class="w-image-title">Autoregressive generation of tokens by a language model.</div><div class="w-image-description">Contains a modification of an image by Annie Surla from NVIDIA.</div></div></a></div><div class="w-separator size_medium"></div>
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="59cd" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">This explains why ChatGPT prints the words out one at a time: It is streaming the words to you as it writes them.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="59cd" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">To generate a token, a language model assigns a likelihood score to each token in its vocabulary. A token gets a high likelihood score if it is a good continuation of the text and a low likelihood score if it is a poor continuation of the text, as assessed by the model. The sum of the likelihood scores over all tokens in the model’s vocabulary is always exactly equal to one.</p>

		</div>
	</div>
<div class="w-image align_center meta_simple"><a class="w-image-h" href="https://developer.nvidia.com/blog/how-to-get-better-outputs-from-your-large-language-model/" title="How to Get Better Outputs from Your Large Language Model" target="_blank"><img width="1024" height="260" src="https://www.megaputer.com/wp-content/uploads/llm_token_sampling_process-1024x260.png" class="attachment-large size-large" alt="A language model assigns likelihood scores to predict the next token in the sequence. Original image by Annie Surla from NVIDIA, modified by Echo Lu with permission from owner." srcset="https://www.megaputer.com/wp-content/uploads/llm_token_sampling_process-1024x260.png 1024w, https://www.megaputer.com/wp-content/uploads/llm_token_sampling_process-300x76.png 300w, https://www.megaputer.com/wp-content/uploads/llm_token_sampling_process-768x195.png 768w, https://www.megaputer.com/wp-content/uploads/llm_token_sampling_process-600x152.png 600w, https://www.megaputer.com/wp-content/uploads/llm_token_sampling_process.png 1095w" sizes="(max-width: 1024px) 100vw, 1024px" /><div class="w-image-meta"><div class="w-image-title">A language model assigns likelihood scores to predict the next token in the sequence.</div><div class="w-image-description">Original image by Annie Surla from NVIDIA, modified with permission from owner.</div></div></a></div><div class="w-separator size_medium"></div>
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="59cd" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">After the likelihood scores are assigned, a token sampling scheme is used to pick the next token in the sequence. The token sampling scheme may incorporate some randomness so that the language model does not answer the same question in the same way every time. This randomness can be a nice feature in chatbots, as well as in some other applications.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="59cd" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph=""><em class="oc">TLDR: Language models break down text into chunks, predict the next chunk in the sequence, and mix in some randomness. Repeat as needed to generate output.</em></p>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner " id="section2"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper"></div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2 id="1164" class="oc nc ev be nd od oe fv nh of og fy nl oh oi oj ok ol om on oo op oq or os ot bj">Quality, Diversity, and Temperature</h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>But why would we ever want to pick the second-best token, the third-best token, or any other token besides the best, for that matter? Wouldn’t we want to pick the best continuation (the one with the highest likelihood score) every time? Often, we do. But if we picked the best answer every time, we would get the same answer every time. If we want a diverse range of answers, we may have to give up some quality to get it. This sacrifice of quality for diversity is called the quality-diversity tradeoff.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>With this in mind, <em class="oc">temperature</em> tells the machine how to navigate the quality-diversity tradeoff. Low temperatures mean more quality, while high temperatures mean more diversity. When the temperature is set to zero, the model always samples the token with the highest likelihood score, resulting in zero diversity between queries, but ensuring that we always pick the highest quality continuation as assessed by the model.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p>Most commonly, we will want to set the temperature to zero for text analytics tasks. This is because, in text analytics, there is often a single “right” answer that we want to get every time. At temperature zero, we have the best chance of arriving at this answer in one shot. I like to set the temperature to zero for entity extraction, fact extraction, sentiment analysis, and most other things that I get up to in my job as an analyst. As a rule, you should always choose temperature zero for any prompt that you will only pass to the model once, as this is most likely to get you a good answer.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="a120" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">At higher temperatures, we see more garbage and hallucinations, less coherence, and lower quality of responses on average, but also more creative and diverse responses. We use temperatures higher than zero when we want to pass the same prompt to the model many times and get many creative responses. We recommend that you should <em class="oc">only </em>use non-zero temperatures when you want to ask the same question twice and get two different answers.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>
<div class="w-separator size_small"></div><div class="w-image align_center meta_simple"><div class="w-image-h"><img width="423" height="355" src="https://www.megaputer.com/wp-content/uploads/quality-diversity_tradeoff.png" class="attachment-large size-large" alt="Higher temperatures bring diversity, creativity, and multiplicity of answers but also add garbage, incoherence, and hallucinations. Image by Echo Lu." srcset="https://www.megaputer.com/wp-content/uploads/quality-diversity_tradeoff.png 423w, https://www.megaputer.com/wp-content/uploads/quality-diversity_tradeoff-300x252.png 300w" sizes="(max-width: 423px) 100vw, 423px" /><div class="w-image-meta"><div class="w-image-title">Quality Diversity Tradeoff</div><div class="w-image-description">Temperature adds diversity, creativity, and multiplicity of answers but also garbage, incoherence, and hallucinations.</div></div></div></div><div class="w-separator size_medium"></div>
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="a120" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">And why would we ever want two different answers to the same prompt? In some cases, having many answers to one prompt can be very useful. For example, there is a technique in which we generate many answers to a prompt and keep only the best one, which often produces better answers than a single query at temperature zero. Another use case is synthetic data generation: We want many synthetic data points, not just one. We may discuss these use cases (and others) in later articles, but more often than not, we need only one answer. <strong class="lr fp">When in doubt, choose temperature zero!</strong></p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="a120" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">It is important to note that while temperature zero should <em class="oc">in theory</em> produce the same answer every time, this may not be true in practice! This is because the GPUs the model is running on can be prone to small miscalculations, such as rounding errors. These errors introduce a low level of randomness into the calculations, even at temperature zero. Since changing one token in a text can significantly alter its meaning, a single error may cause a cascade of different token choices later in the text, resulting in an almost totally different output. But rest assured that this usually has a negligible impact on quality. We only mention it so that you’re not surprised when you get some randomness at temperatures zero.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="a120" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">There are more ways to navigate the quality-diversity tradeoff than temperature alone. In the next section, we will discuss some modifications to the temperature sampling technique. But if you are content with using temperature zero, feel free to skip it for now. You may rest soundly knowing that your choice of these parameters at temperature zero will not affect your answer.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="a120" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph=""><em class="oc">TLDR: Temperature increases diversity but decreases quality by adding randomness to the model’s outputs.</em></p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner " id="section3"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper"></div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2 id="a476" class="nb nc fo be nd ne nf go ng nh ni gr nj nk nl nm nn no np nq nr ns nt nu nv nw bj">Top-k and Top-p sampling</h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="6dd4" class="pw-post-body-paragraph lp lq fo lr b gm nx lt lu gp ny lw lx ly nz ma mb mc oa me mf mg ob mi mj mk fh bj" data-selectable-paragraph="">One common way to tweak our token-sampling formula is called top-k sampling. Top-k sampling is a lot like ordinary temperature sampling, except that the lowest likelihood tokens are excluded from being picked: Only the “top k” best choices are considered, which is where we get the name. The advantage of this method is that it stops us from picking truly bad tokens.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Let’s suppose, for example, that we are trying to make a completion for “The sun rises in the…” Then, without top-k sampling, the model considers every token in its vocabulary as a possible continuation of the sequence. Then there is some non-zero chance that it will write something ridiculous like “The sun rises in the refrigerator.” With top-k sampling, the model filters out these truly bad picks and only considers the k best options. By clipping off the long tail, we lose a little diversity, but our quality shoots way up.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>
<div class="w-separator size_small"></div><div class="w-image align_center meta_simple"><div class="w-image-h"><img width="565" height="453" src="https://www.megaputer.com/wp-content/uploads/unreliable-tail.png" class="attachment-large size-large" alt="Top-k sampling improves quality by keeping only the k best candidate tokens and throwing out the rest. Image by Echo Lu." srcset="https://www.megaputer.com/wp-content/uploads/unreliable-tail.png 565w, https://www.megaputer.com/wp-content/uploads/unreliable-tail-300x241.png 300w, https://www.megaputer.com/wp-content/uploads/unreliable-tail-499x400.png 499w" sizes="(max-width: 565px) 100vw, 565px" /><div class="w-image-meta"><div class="w-image-title">Top-k sampling</div><div class="w-image-description">Top-k sampling improves quality by keeping only the k best candidate tokens and throwing out the rest. </div></div></div></div><div class="w-separator size_medium"></div>
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Top-k sampling is a way to have your cake and eat it too: It gets you the diversity you need at a smaller cost to quality than with temperature alone. Since this technique is so wildly effective, it has inspired many variants.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">One common variant of top-k sampling is called top-p sampling, which is also known as nucleus sampling. Top-p sampling is a lot like top-k, except that it uses likelihood scores instead of token ranks to determine where it clips the tail. More specifically, it only considers those top-ranked tokens whose combined likelihood exceeds the threshold p, throwing out the rest.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">The power of top-p sampling compared to top-k sampling becomes evident when there are many poor or mediocre continuations. Suppose, for example, that there are only a handful of good picks for the next token, and there are dozens that just vaguely make sense. If we were using top-k sampling with k=25, we would be considering many poor continuations. In contrast, if we used top-p sampling to filter out the bottom 10% of the probability distribution, we might only consider those good tokens while filtering out the rest.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">In practice, top-p sampling tends to give better results compared to top-k sampling. By focusing on the cumulative likelihood, it adapts to the context of the input and provides a more flexible cut-off. So, in conclusion, top-p and top-k sampling can both be used at non-zero temperatures to capture diversity at a lower quality cost, but top-p sampling usually does it better.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Tip: For both of these settings, lower value = more filtering. At zero, they will filter out all but the top-ranked token, which has the same effect as setting the temperature to zero. So please use these parameters, but be aware that setting them too low will give up all of your diversity.</p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="8e2d" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph=""><em class="oc">TLDR: Top-k and top-p increase quality at only a small cost to diversity. They achieve this by removing the worst token choices before random sampling.</em></p>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>
<figure class="mo mp mq mr ms mt ml mm paragraph-image"></figure>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner " id="section4"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper"></div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2 id="e517" class="nb nc fo be nd ne nf go ng nh ni gr nj nk nl nm nn no np nq nr ns nt nu nv nw bj">Frequency and Presence Penalties</h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="2cf9" class="pw-post-body-paragraph lp lq fo lr b gm nx lt lu gp ny lw lx ly nz ma mb mc oa me mf mg ob mi mj mk fh bj" data-selectable-paragraph="">We have just two more parameters to discuss before we start to wrap things up: The frequency and presence penalties. These parameters are — big surprise— yet another way to navigate the quality-diversity tradeoff. But while the temperature parameter achieves diversity by adding randomness to the token sampling procedure, the frequency and presence penalties add diversity by penalizing the reuse of tokens that have already occurred in the text. This makes the sampling of old and overused tokens less likely, influencing the model to make more novel token choices.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="d640" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">The frequency penalty adds a penalty to a token for each time it has occurred in the text. This discourages repeated use of the same tokens/words/phrases and also has the side effect of causing the model to discuss more diverse subject matter and change topics more often. On the other hand, the presence penalty is a flat penalty that is applied if a token has already occurred in the text. This causes the model to introduce more new tokens/words/phrases, which causes it to discuss more diverse subject matter and change topics more often without significantly discouraging the repetition of frequently used words.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="d640" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Much like temperature, the frequency and presence penalties lead us away from the “best” possible answer and toward a more creative one. But instead of doing this with randomness, they add targeted penalties that are carefully calculated to inject diversity into the answer. On some of those rare tasks requiring a non-zero temperature (when you require many answers to the same prompt), you might also consider adding a small frequency or presence penalty to the mix to boost creativity. But for prompts having just one right answer that you want to find in just one try, your odds of success are highest when you set all of these parameters to zero.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="d640" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">As a rule, when there is one right answer, and you are asking just one time, you should set the frequency and presence penalties to zero. But what if there are many right answers, such as in text summarization? In this case, you have a little discretion. If you find a model’s outputs boring, uncreative, repetitive, or limited in scope, judicious application of the frequency or presence penalties could be a good way to spice things up. But our final suggestion for these parameters is the same as for temperature: <strong class="lr fp">When in doubt, choose zero!</strong></p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="d640" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">We should note that while temperature and frequency/presence penalties both add diversity to the model’s responses, the kind of diversity that they add is not the same. The frequency/presence penalties increase the diversity <em class="oc">within a single response. </em>This means that a response will have more distinct words, phrases, topics, and subject matters than it would have without these penalties. But when you pass the same prompt twice, you are not more likely to get two different answers. This is in contrast with temperature, which increases diversity <em class="oc">between responses:</em> At higher temperatures, you will get a more diverse range of answers when passing the same prompt to the model many times.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="d640" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">I like to refer to this distinction as within-response diversity vs. between-response diversity. The temperature parameter adds both within-response AND between-response diversity, while the frequency/presence penalties add only within-response diversity. So, when we need diversity, our choice of parameters should depend on the kind of diversity we need.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="d640" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph=""><em class="oc">TLDR: The frequency and presence penalties increase the diversity of subject matters discussed by a model and make it change topics more often. The frequency penalty also increases the diversity of word choice by reducing the repetition of words and phrases.</em></p>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner " id="section5"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper"></div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2 id="c9f4" class="nb nc fo be nd ne nf go ng nh ni gr nj nk nl nm nn no np nq nr ns nt nu nv nw bj">The Parameter-Tuning Cheat Sheet</h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="328a" class="pw-post-body-paragraph lp lq fo lr b gm nx lt lu gp ny lw lx ly nz ma mb mc oa me mf mg ob mi mj mk fh bj" data-selectable-paragraph="">This section is intended as a practical guide for choosing your model’s input parameters. We first provide some hard-and-fast rules for deciding which values to set to zero. Then, we give some tips to help you find the right values for your non-zero parameters.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="328a" class="pw-post-body-paragraph lp lq fo lr b gm nx lt lu gp ny lw lx ly nz ma mb mc oa me mf mg ob mi mj mk fh bj" data-selectable-paragraph="">I strongly encourage you to use this cheat sheet when choosing your input parameters. Go ahead and bookmark this page now so you don’t lose it!</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h3 id="9e0d" class="om nc fo be nd on oo op ng oq or os nj ly ot ou ov mc ow ox oy mg oz pa pb pc bj">Rules for setting parameters to zero:</h3>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="88ed" class="pw-post-body-paragraph lp lq fo lr b gm nx lt lu gp ny lw lx ly nz ma mb mc oa me mf mg ob mi mj mk fh bj" data-selectable-paragraph="">Temperature:</p>
<ul>
<li>For a <strong class="lr fp">single answer</strong> per prompt: Zero.</li>
<li>For <strong class="lr fp">many answers</strong> per prompt: Non-zero.</li>
</ul>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="60cf" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Frequency and Presence Penalties:</p>
<ul>
<li>When there is<strong class="lr fp"> one correct answer:</strong> Zero.</li>
<li>When there are <strong class="lr fp">many correct answers:</strong> Optional.</li>
</ul>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="1d77" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Top-p/Top-k:</p>
<ul class="">
<li id="be8e" class="lp lq fo lr b gm ls lt lu gp lv lw lx ly pd ma mb mc pe me mf mg pf mi mj mk pg ph pi bj" data-selectable-paragraph="">With <strong class="lr fp">zero temperature</strong>: The output is not affected.</li>
<li id="9713" class="lp lq fo lr b gm pj lt lu gp pk lw lx ly pl ma mb mc pm me mf mg pn mi mj mk pg ph pi bj" data-selectable-paragraph="">With <strong class="lr fp">non-zero temperature</strong>: Non-zero.</li>
</ul>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="1d77" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">If your language model has additional parameters not listed here, leave them at their default values.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h3 id="32f8" class="om nc fo be nd on oo op ng oq or os nj ly ot ou ov mc ow ox oy mg oz pa pb pc bj">Tips for tuning the non-zero parameters:</h3>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="1d77" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Make a list of those parameters that should have non-zero values, and then go to a playground and fiddle around with some test prompts to see what works. <em>But if the rules above say to leave a parameter at zero, leave it at zero!</em></p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="1d77" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Tuning temperature/top-p/top-k:</p>
<ol class="">
<li id="1ea3" class="lp lq ev lr b ft ls lt lu fw lv lw lx ly pd ma mb mc pe me mf mg pf mi mj mk po ph pi bj" data-selectable-paragraph="">For more diversity/randomness, increase the temperature.</li>
<li id="641b" class="lp lq ev lr b ft pj lt lu fw pk lw lx ly pl ma mb mc pm me mf mg pn mi mj mk po ph pi bj" data-selectable-paragraph="">With non-zero temperatures, start with a top-p around 0.95 (or top-k around 250) and lower it as needed.</li>
</ol>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="950f" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Troubleshooting:</p>
<ol class="">
<li id="f1ed" class="lp lq ev lr b ft ls lt lu fw lv lw lx ly pd ma mb mc pe me mf mg pf mi mj mk po ph pi bj" data-selectable-paragraph="">If there is too much nonsense, garbage, or hallucination, decrease temperature and/or decrease top-p/top-k.</li>
<li id="77ec" class="lp lq ev lr b ft pj lt lu fw pk lw lx ly pl ma mb mc pm me mf mg pn mi mj mk po ph pi bj" data-selectable-paragraph="">If the temperature is high and diversity is low, increase top-p/top-k.</li>
</ol>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="950f" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Tip: While some interfaces allow you to use top-p and top-k at the same time, we prefer to keep things simple by choosing one or the other. Top-k is easier to use and understand, but top-p is often more effective.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="caaa" class="pw-post-body-paragraph lp lq ev lr b ft ls lt lu fw lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk eo bj" data-selectable-paragraph="">Tuning frequency penalty and presence penalty:</p>
<ol class="">
<li id="f568" class="lp lq ev lr b ft ls lt lu fw lv lw lx ly pd ma mb mc pe me mf mg pf mi mj mk po ph pi bj" data-selectable-paragraph="">For more diverse topics and subject matters, increase the presence penalty.</li>
<li id="db7a" class="lp lq ev lr b ft pj lt lu fw pk lw lx ly pl ma mb mc pm me mf mg pn mi mj mk po ph pi bj" data-selectable-paragraph="">For more diverse and less repetitive language, increase the frequency penalty.</li>
</ol>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="950f" class="pw-post-body-paragraph lp lq ev lr b ft ls lt lu fw lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk eo bj" data-selectable-paragraph="">Troubleshooting:</p>
<ol class="">
<li id="562f" class="lp lq ev lr b ft ls lt lu fw lv lw lx ly pd ma mb mc pe me mf mg pf mi mj mk po ph pi bj" data-selectable-paragraph="">If the outputs seem scattered and change topics too quickly, decrease the presence penalty.</li>
<li id="108e" class="lp lq ev lr b ft pj lt lu fw pk lw lx ly pl ma mb mc pm me mf mg pn mi mj mk po ph pi bj" data-selectable-paragraph="">If there are too many new and unusual words, or if the presence penalty is set to zero and you still get too many topic changes, decrease the frequency penalty.</li>
</ol>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="950f" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph=""><em>TLDR: You can use this section as a cheat sheet for tuning language models. You are <strong>definitely</strong> going to forget these rules, so bookmark this page and use it later as a reference.</em></p>

		</div>
	</div>
<div class="g-cols wpb_row type_default valign_top vc_inner " id="section6"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper"></div></div></div></div></div></div></div></div></div></section><section class="l-section wpb_row height_small"><div class="l-section-h i-cf"><div class="g-cols vc_row type_default valign_top"><div class="vc_col-sm-12 wpb_column vc_column_container"><div class="vc_column-inner"><div class="wpb_wrapper">
	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<h2 id="7610" class="nb nc fo be nd ne nf go ng nh ni gr nj nk nl nm nn no np nq nr ns nt nu nv nw bj">Wrapping up</h2>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="a507" class="pw-post-body-paragraph lp lq fo lr b gm nx lt lu gp ny lw lx ly nz ma mb mc oa me mf mg ob mi mj mk fh bj" data-selectable-paragraph="">There is no limit to the possible token-sampling strategies out there. Other notable strategies include beam search and adaptive sampling. However, the ones we’ve discussed here — temperature, top-k, top-p, frequency penalty, and presence penalty — are the most commonly used parameters. These are the parameters that you can expect to find in models like Claude, Llama, and the GPT series. In this article, we have shown that all of these parameters are really just here to help us navigate the quality-diversity tradeoff.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="e36b" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">Before we go, there is one last input parameter to mention: maximum token length. The maximum token length is just the cutoff where the model stops printing its answer, even if it is not finished responding. After this complex discussion, we hope this one is self-explanatory. ?</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="e36b" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph="">As we move further in this series, we’ll do more deep dives into advanced topics. Future articles will discuss prompt engineering, choosing the right language model for your use case, and more! Stay tuned, and keep exploring the vast horizons of language models with Megaputer Intelligence.</p>

		</div>
	</div>

	<div class="wpb_text_column ">
		<div class="wpb_wrapper">
			<p id="e36b" class="pw-post-body-paragraph lp lq fo lr b gm ls lt lu gp lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk fh bj" data-selectable-paragraph=""><em class="oc">TLDR: When in doubt, set the temperature, frequency penalty, and presence penalty to zero. If that doesn’t work for you, reference the cheat sheet above.</em></p>

		</div>
	</div>
</div></div></div></div></div></section>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/mastering-language-models-a-deep-dive-into-input-parameters/">Mastering Language Models: A Deep Dive into Input Parameters</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></content:encoded>
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.w3-edge.com/products/

Page Caching using disk: enhanced 
Minified using disk

Served from: www.megaputer.com @ 2026-06-09 13:14:34 by W3 Total Cache
-->