<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dictionaries &#8211; Megaputer Intelligence</title>
	<atom:link href="https://www.megaputer.com/tag/dictionaries/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.megaputer.com</link>
	<description>Your Knowledge Partner</description>
	<lastBuildDate>Tue, 24 Mar 2026 00:02:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.0.22</generator>

<image>
	<url>https://www.megaputer.com/wp-content/uploads/favicon.png</url>
	<title>Dictionaries &#8211; Megaputer Intelligence</title>
	<link>https://www.megaputer.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Creating a domain-specific dictionary using extraction nodes in PolyAnalyst</title>
		<link>https://www.megaputer.com/domain-specific-dictionary-extraction-node-polyanalyst/</link>
		<pubDate>Fri, 13 Jul 2018 17:00:10 +0000</pubDate>
		<dc:creator><![CDATA[Lianne Huang]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Dictionaries]]></category>
		<category><![CDATA[Insurance Claims]]></category>
		<category><![CDATA[Text Analytics]]></category>

		<guid isPermaLink="false">https://www.megaputer.com/?p=22717</guid>
		<description><![CDATA[<p>PolyAnalyst comes with several different default dictionaries such as a morphology dictionary, a synonym dictionary, and a dictionary of human names. Those generic dictionaries cover general terms that are useful for a variety of fields. However, when you work with a domain-specific dataset, such as a medical dataset or a car repair dataset, it is...</p>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/domain-specific-dictionary-extraction-node-polyanalyst/">Creating a domain-specific dictionary using extraction nodes in PolyAnalyst</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p><a href="https://www.megaputer.com/polyanalyst/">PolyAnalyst</a> comes with several different default dictionaries such as a morphology dictionary, a synonym dictionary, and a dictionary of human names. Those generic dictionaries cover general terms that are useful for a variety of fields. However, when you work with a domain-specific dataset, such as a medical dataset or a car repair dataset, it is crucial to have domain-specific dictionaries that include related terms.</p>
<h3><strong>Why do you need domain-specific dictionaries?</strong></h3>
<p>As you may know, any query may introduce false positives as well as false negatives. For example, you may have the following note from your car repair dataset; the query for answering “Which car component has what issues?” matches the highlighted parts.</p>
<p><img class="aligncenter size-full wp-image-22713" src="https://www.megaputer.com/wp-content/uploads/blog_example.png" alt="" width="1066" height="129" srcset="https://www.megaputer.com/wp-content/uploads/blog_example.png 1066w, https://www.megaputer.com/wp-content/uploads/blog_example-600x73.png 600w, https://www.megaputer.com/wp-content/uploads/blog_example-300x36.png 300w, https://www.megaputer.com/wp-content/uploads/blog_example-768x93.png 768w, https://www.megaputer.com/wp-content/uploads/blog_example-1024x124.png 1024w" sizes="(max-width: 1066px) 100vw, 1066px" /></p>
<p>You notice that there is one false positive, “noise at highway speeds”, where “highway speeds” is identified incorrectly as a car part. You also notice that there are two false negatives, “front wipers streaking glass” and “misfire in cylinder 6”, which the query fails to catch. How can you fix it? You may change your query to fix these problems, but it is very likely that you will get more false positives and/or false negatives as you continue to check your results. Do you want to repetitively modify your queries?</p>
<p>Instead of continuously modifying your queries to avoid false positives and to include any false negatives you discovered, a more efficient way to find your desired results is to use dictionaries. For example, one could compile a dictionary of relevant car part terms to improve coverage of false negatives. Then, by generating a stoplist, one can eliminate the false positives that would not be useful words when searching for car parts. In addition to compiling reusable dictionaries, this will allow you to write queries that are clearer and easier to understand for your analysis.</p>
<p>As you can imagine, using dictionaries in your queries can greatly increase the accuracy of your results. When you increase the coverage of your dictionaries, it will be very likely that the coverage of your results will increase as well. For instance, if you are searching for patterns that identify a car part and its issues, you could add known car parts to the dictionary to return reliable results or to discover new ways to articulate an issue. Specifically, if you add “wiper” and “cylinder” to your car part dictionary, the query will automatically match things like “wiper malfunction”, “cylinder malfunction”, “faulty wiper”, “faulty cylinder”, etc.</p>
<h3><strong>How do you build such a dictionary?</strong></h3>
<p>You can easily build such dictionaries with the help of PolyAnalyst. Using the Entity Extraction node along with the Keyword Extraction node, we can extract terms of interest. Once you have extracted the list of terms with the desired patterns, what should you do next? Is every term in the list what you want? In reality, often you will get junk terms in your list due to the complexity of human language. So then the question is: how to get rid of them?</p>
<p>Luckily, PolyAnalyst allows users to validate their extraction results, so they can ensure that the list of terms being extracted and used in subsequent dictionaries are relevant and accurate.</p>
<h3><strong>An example for building a Medical Device Dictionary:</strong></h3>
<p>For instance, if you have a medical device repair dataset, you can use this method to generate a dictionary of device components. Since components are nouns, you could use <a href="https://www.megaputer.com/polyanalyst/">PolyAnalyst,</a> extraction nodes to generate a list of frequently used nouns. Once you have the list of possible components, you can start to validate them, then export the ones that are marked as “Valid” to your device component dictionary. In this way, you obtain a clean dictionary for your analysis.</p>
<p>Below is an example of validating medical device components using the Entity Extraction node with customized queries.</p>
<p><img class="aligncenter size-full wp-image-22711" src="https://www.megaputer.com/wp-content/uploads/blog.jpg" alt="" width="975" height="502" srcset="https://www.megaputer.com/wp-content/uploads/blog.jpg 975w, https://www.megaputer.com/wp-content/uploads/blog-600x309.jpg 600w, https://www.megaputer.com/wp-content/uploads/blog-300x154.jpg 300w, https://www.megaputer.com/wp-content/uploads/blog-768x395.jpg 768w" sizes="(max-width: 975px) 100vw, 975px" /></p>
<p>Now you can store these domain-specific nouns in a dictionary. When receiving more medical device repair data, by using the same queries that use this dictionary, you will automatically achieve greater coverage. There is no need to recompile the dictionary for new data.</p>
<p>In brief, domain-specific dictionaries help data analysts maintain simpler queries and get better coverage in an efficient way.</p>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/domain-specific-dictionary-extraction-node-polyanalyst/">Creating a domain-specific dictionary using extraction nodes in PolyAnalyst</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></content:encoded>
			</item>
		<item>
		<title>PolyAnalyst introduces support for using stop list dictionaries when analyzing spelling errors</title>
		<link>https://www.megaputer.com/stop-list-dictionary-spell-check/</link>
		<pubDate>Tue, 15 May 2018 18:44:31 +0000</pubDate>
		<dc:creator><![CDATA[Jeff Palan]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Dictionaries]]></category>
		<category><![CDATA[Morphology]]></category>
		<category><![CDATA[Text Analytics]]></category>

		<guid isPermaLink="false">https://www.megaputer.com/?p=20267</guid>
		<description><![CDATA[<p>In previous builds of PolyAnalyst the only way to stop a word from showing up in the spell checker was to add it to the morphology dictionary. This would sometimes result in having to add things to the morphology dictionary that might not really belong there, such as product codes (Model ABC-XYZ) and the occasional...</p>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/stop-list-dictionary-spell-check/">PolyAnalyst introduces support for using stop list dictionaries when analyzing spelling errors</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>In previous builds of <a href="https://www.megaputer.com/polyanalyst/">PolyAnalyst</a> the only way to stop a word from showing up in the spell checker was to add it to the morphology dictionary. This would sometimes result in having to add things to the morphology dictionary that might not really belong there, such as product codes (Model ABC-XYZ) and the occasional single foreign word (e.g. <em>yukata</em>).</p>
<p>In order to prevent this dictionary mismatch PolyAnalyst now includes stoplist functionality for spell checking. In other words you can define a list of words for the spellchecker to ignore without assigning them as actual English words.</p>
<p>Consider the following example of dealing with the problematic word &#8220;yukata&#8221;.</p>
<h2>Without a stop list:</h2>
<p>The word is identified as a spelling error.</p>
<h4><strong><img class="size-full wp-image-20268 aligncenter" src="https://www.megaputer.com/wp-content/uploads/stoplist_graphic.png" alt="" width="222" height="105" /></strong></h4>
<h2>With a stop list: no more &#8220;yukata&#8221;!</h2>
<p>After adding the word to a stop list, the word is no longer identified as a spelling error.</p>
<p><img class="aligncenter size-full wp-image-20269" src="https://www.megaputer.com/wp-content/uploads/yukata-graphic.png" alt="" width="274" height="107" /></p>
<p>The post <a rel="nofollow" href="https://www.megaputer.com/stop-list-dictionary-spell-check/">PolyAnalyst introduces support for using stop list dictionaries when analyzing spelling errors</a> appeared first on <a rel="nofollow" href="https://www.megaputer.com">Megaputer Intelligence</a>.</p>
]]></content:encoded>
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.w3-edge.com/products/

Page Caching using disk: enhanced 
Minified using disk

Served from: www.megaputer.com @ 2026-04-29 18:49:37 by W3 Total Cache
-->