Data Mining – Megaputer Intelligence

Query languages—the Swiss army knife of information extraction

Echo Lu — Tue, 06 Feb 2024 05:19:41 +0000

Text mining, the art of extracting information from text, requires the formulation of efficient queries that retrieve information based on user input. To do this, the user requires a language for writing queries. For the most basic use cases, the language operators could be regex or string search. But while regex and string search are indispensable for text mining, their utility hits a hard ceiling when semantic or meaningful search is required. They cannot, for example, capture complex entities such as human names, corporations, and drugs. For handling tasks like these, we need a more powerful query language that has semantic understanding, such as Megaputer’s PDL.

So, what is PDL, and how does it achieve semantic understanding while regex and string search do not? Let’s take a look at an example to find out.

First of all, PDL does not just search for the literal form of the word in the query: instead, it automatically extends its search to all morphological forms of the word. For example, when searching for the word “company” in financial news articles, the PDL query will not only find “company”, but also “companies,” the plural form. This feature often comes in handy, especially when the search involves a verb. Suppose that you are interested in extracting what the CEOs said. With regex or other substring search, you will need to list all possible verb forms such as “say,” “saying,” “says,” and “said.” With PDL, simply entering “say” in the query will automatically fetch all possible verb forms. This behavior can also be turned off by enclosing the word in the form function, which will then restrict the search to the literal form of the word, such as in the example below.

Another notable feature of the PDL language is its capability for users to tailor the scope of their searches using a range of built-in functions. Returning to the previous example, you may not wish to confine your search exclusively to the specific verb “say,” but rather include other synonymous verbs like “tell” or “mention.” Achieving this is straightforward with PDL – users can invoke the synonym function with the verb “say,” as demonstrated in (a) below. As the subsequent results table (b) illustrates, the captured text now includes various speech verbs such as “tell,” “emphasize,” and “claim,” in addition to the word “say,” capturing them in all possible verb forms. For additional flexibility, the user can also create and modify synonym dictionaries.

The PDL language offers various modes of information extraction, including proximity search (e.g., finding words A and B within a sentence, or within a 3-words range), syntactic relation (e.g., finding word A that is the subject or object of B), semantic relation (e.g., finding words that are synonyms/antonyms to word A), access to dictionaries and ontologies, and more. This language is expressive enough to capture complex patterns, and yet relatively easy to use, having a syntax that closely resembles English. Having access to this versatile query language significantly enhances the power and quality of text mining operations.

In conclusion, PDL is a powerful and versatile query language that enables users to extract meaningful information from text with greater efficiency and accuracy than competing methods like regex or string search. Its ability to understand and capture morphological forms, synonyms, and other complex patterns makes it an indispensable tool for solving text mining tasks that require semantic understanding. By leveraging the capabilities of PDL, users can enhance their information extraction processes and gain valuable insights from their data, making it a true Swiss army knife of information extraction.

The post Query languages—the Swiss army knife of information extraction appeared first on Megaputer Intelligence.

How to Audit and Explore Datasets in PolyAnalyst

Kathryn Verhoeven — Wed, 22 Jan 2020 14:00:43 +0000

Suppose you just obtained a fresh new set of data. You might know what it’s generally about, but the only way you can know what hypotheses to test (and what analysis challenges you’re up against) is to explore the data and audit its overall quality. Unless you can see the basic overview of what you’re dealing with, you won’t be able to determine the most appropriate models to build or develop a strategy for handling challenges such as outliers and missing values.

For these reasons, exploratory data analysis is regarded as an important step in data mining methodologies.

What is exploratory data analysis?

The overarching concept of exploratory data analysis (EDA) was first advocated by John Tukey. His foundational work on EDA, published in the late 1970s, championed the idea of assessing the data before rushing into testing hypotheses and building models. Since that time, EDA has become an integral part of data mining methodologies, including CRISP-DM, KDD, and SEMMA. The goal of EDA is to get an overall feel for the data you’re about to spend time analyzing so that you can better address the key analysis questions and objectives.

Specifically, some of the key purposes of exploratory analysis include:

Obtaining an overview of the data
- Discover patterns
- Frame hypotheses
- Check assumptions
Gathering general statistical information about the data
- Distribution
- Key statistics (mean, median, range, standard deviation, etc.)
Detecting data anomalies
- Outliers
- Missing Data

Graphical visualization methods comprise the majority of EDA techniques, but non-graphical quantitative measures are just as important to incorporate. More information on specific EDA techniques is available from sources such as this handbook from the NIST Information Technology Laboratory.

Exploratory analysis using PolyAnalyst

Users of PolyAnalyst know that the system comes with a wide variety of analysis features for handling structured data as well as text. And for basic EDA, the system’s data loading nodes automatically provide a convenient view of the overall data patterns from a univariate perspective.

For example, if we load an example dataset containing car data, the analyst can obtain a basic overview of the dataset variables and the data types by viewing the data in tabular form.

In the Statistics tab of a dataset node, the various statistics for each variable are presented for quick assessment of data anomalies, such as outliers and missing data. For example, in the car data, we can see the mean, median, range, and standard deviation of the Power variable, and we observe that there are 6 missing values.

Finally, the Distinct tab lists all the unique values a selected variable has within the dataset along with the relative percentage distribution among those values. In the case of the example car data, we can view all the Year values for the cars in the dataset as well as their relative distribution.

PolyAnalyst’s built-in EDA tools

From the basic overview provided in any PolyAnalyst data node, we can begin to formulate a plan for how to proceed with data cleansing and analysis. And to compare sets of variables against each other for a bivariate (or multivariate) analysis, the various chart nodes (such as bar charts and scatter plots) are very quick to set up and view relationships among variables. But there are a few additional built-in tools in PolyAnalyst for EDA that are worth highlighting, such as the Data Audit node and the Distribution Analysis node.

Let’s now go over these tools briefly and see how they work.

Data Audit Node

In PolyAnalyst, the Data Audit node is a built-in tool for exploring your data. It is useful when first examining a new dataset and provides lots of information you can quickly scan to gain preliminary insights into your dataset. This node serves as a summarization node, so it does not connect up to additional downstream analysis nodes in the project script. Its main purpose is to guide your decision making on how to cleanse, prepare, and analyze your data.

The node settings allow you to specify which columns (i.e., variables) to analyze and whether or not to perform both summary statistics and anomaly detection. The resulting report shows a synopsis of the statistics and anomaly metrics, which are useful to consider prior to applying a predictive model or classification model.
Here’s an example of the node output when analyzing a crime data example dataset, which lists the date, district, category, and description of various crime reports:

The data audit output report consists of multiple tabs that assess the data for anomalies, arranged according to the data type (i.e., categorical, numerical, text). Summary statistics for each variable are also listed in the report under different tabs. In the example above, the system has identified and presented in the lower pane three potentially anomalous records with dates outside of the expected range covering the vast majority of records (years 1997-2005).

Distribution Analysis Node

The Distribution Analysis node in PolyAnalyst is a helpful tool for analyzing trends and distributions of numerical data values in a dataset. Along with calculating statistical characteristics, the node recognizes distributions (normal, exponential, double exponential, log-normal, and uniform), performs general hypothesis testing,

and groups variables based on those tests. When setting up this node, the user can specify which variables to analyze, as well as the significance levels, tail properties, and other parameters of the statistical tests to be performed.

The resulting report contains a list of the statistical tests that have been conducted as well as their results. The output report will also show information like the fitted distributions and labeled tails, which can later be appended as a separate column to the original dataset. The Distribution Analysis node also conveniently connects to other analysis nodes to pass its results to predictive modeling and machine learning.

Let’s see how this works using the car data from our earlier example:

The distribution analysis performed here was set up to analyze the patterns for the MPG variable in data records representing individual cars, grouped by a car’s Origin (i.e., Europe, Japan, USA). The resulting report displays the MPG distribution type for each Origin group along with other statistics in the top table. By highlighting a specific Origin group (e.g., Europe), the user can view the test parameter information that resulted in assigning this sample a log-normal distribution.

On the summary tab, a fitted distribution chart is displayed for the selected subset. For example, if we select the Europe subset, we can see the log-normal curve fitting of the data.

Conveniently, the information obtained from the distribution analysis can be scored against the dataset to create an additional column. For example, in the image below we can see the scored values for MPG significance (blue box) and status (red box) from our distribution analysis appended to the car dataset, which can be useful for analyzing anomalies (marked by the system as belonging to either the upper or lower Tails of the identified distribution).

Keep in mind, however, that these PolyAnalyst’s built-in EDA tools are only a couple of techniques that can be employed when assessing data quality. There are, of course, more analysis features in PolyAnalyst that analysts can use to conduct EDA. You can check out this page for more information on the types of analysis nodes and capabilities in PolyAnalyst. Feel free to contact us for a live demo of these features if you’re interested in learning more.

The post How to Audit and Explore Datasets in PolyAnalyst appeared first on Megaputer Intelligence.

Working with the Twitter API

Elli Bourlai — Mon, 16 Dec 2019 17:52:00 +0000

The constant stream of breaking news, real-time updates, and individual opinions about events, products, or people is a rich source of information that can turn into valuable insights and business intelligence —if extracted in the right way.

Twitter offers an API platform that allows the search and collection of tweets from the public timeline using keywords or the feed of specific accounts. Currently, there are three different tiers of search APIs: Standard (free), Premium, and Enterprise. These tiers differ with regard to data limits, search operators, and technical support from Twitter. In most cases, creating a developer account to use the Standard API is sufficient.

But how exactly does the Twitter search API work?

Let’s say that we would like to collect the latest tweets that talk about tasty food. The query may be written as:

tasty food, returning tweets containing both tasty and food, but not necessarily in this order
“tasty food”, returning tweets with the exact phrase tasty food
tasty OR food, returning tweets containing either tasty or food
#tastyfood, returning tweets containing the hashtag #tastyfood

Of course, there are more operators listed in the Twitter API documentation for narrowing our search, some of which are only available for the Premium and Enterprise accounts (premium operators). The Twitter API also has the option of returning the latest tweets, the most popular tweets, or a mix of the latest and most popular tweets; in this case, we have used the option for the latest tweets.

Once a query is submitted, the API sends a request to retrieve the tweets that match the query. Depending on the account tier, the API has limits on the number of requests, as well as different windows of time before exceeding those request limits. The most commonly used Standard API allows 180 requests per 15-minute window that will go back 7 days in the Twitter archive. Each request returns 100 tweets, so we can retrieve 18,000 tweets every 15 minutes until we reach 7 days back in the archive. If our task required access to tweets in the last 30 days—or even the full Twitter archive that reaches back to 2006—as well as a greater number of retrieved tweets per request, then we would need a paid Premium or Enterprise API account.

The data is returned in JSON format and includes the actual tweet along with metadata about each tweet and the user who created it. Consequently, a JSON parser is required to have all this information “translated” into a tabular format and/or imported into a database.

The time it takes to collect our data depends on the number of tweets matching our query, as well as the request and time-window specifications of our tier. To continue with our previous example: if we have a Standard API account and our query for #tastyfood matches 500 tweets within the 7 past days, it will take 5 requests and a few seconds to collect the tweets. But if there are 90,000 tweets within the past 7 days that match our query, it will take 900 requests and five 15-minute windows (a little over an hour) for our data collection.

Even though the free Standard search API only allows us to go back 7 days, we can still collect longitudinal datasets by moving forward instead of backward. For example, we can automate the task of sampling tweets every 1 hour each week. Keep in mind, if it is a highly active topic, we may lose some tweets between our sampling windows; and if it is a less active topic, we may have some repetition in our dataset that we would need to clear out.

One of the main barriers for researchers and companies who need to collect data from Twitter is the requirement of basic programming skills to use the API and its output. For this reason, Megaputer has created a user-friendly Twitter data collection feature in its software, PolyAnalyst™, which takes the programming part out of collecting the data by using a convenient graphical user interface (GUI). Users only need to input their account information, their query, and select further search options offered by the Twitter API (i.e., the language used in the tweets). PolyAnalyst™ then collects the data, automatically parses the returned JSON output, and imports the data into a table for further analysis. In addition, the built-in Scheduler feature allows users to schedule longitudinal data collection over a specific period of time, which will then update the data analysis workflow to include the most recently collected data.

Try it out yourself by requesting a free trial.

The post Working with the Twitter API appeared first on Megaputer Intelligence.

What’s the News with Neural Networks?

Chris Farris — Tue, 28 May 2019 18:10:03 +0000

There are many reasons for the explosion of machine learning advancements over the past decade. We now have vastly improved hardware for fast computation, and memory is cheaper than ever. Data is now “Big Data,” and it is both jealously hoarded and publicly available in repositories such as ImageNet. Individually, these advancements are already a blessing for the technology-space. But for artificial intelligence (AI), they have opened the gates for something truly powerful—Neural Networks.

Neural Whatnow?

Neural Networks. You’ve probably heard of them. They are at the forefront of the machine learning craze and are the driver of many of the most impressive advancements. The technology that led machines to the best humans in Go and the popular video game Starcraft? Neural Networks. The backbone of algorithms that can recognize images and faces, which are igniting a surveillance and privacy panic? Neural Networks.

And yet, Neural Networks aren’t some new idea spawned from the incubator of a giant tech company. They aren’t some stroke of genius from a college student turned dropout who went on to found a revolutionary tech firm. Neural Networks are in fact… old hat. Or they were.

The concept of a Neural Network (something we will get to later) has been around for years. They date back to the 1970s, and simpler versions of them existed even in the 1940s! So, if they have existed for decades, why are they only popular now?

The answer is related to the hardware and data advancements mentioned earlier. Neural Networks crunch a lot of numbers. They also need a lot of data to help them learn. Up until the past decade, this made training anything but the simplest networks highly time-consuming and expensive.

With all the great improvements to hardware over the years, the possibility of using more advanced Neural Networks became possible. Aided by hardware demands from the entertainment industry and now cryptominers, GPUs (graphical processing units) have been developed which can calculate specific mathematical operations at lightning speed. Luckily, these same kinds of operations occur in training neural networks. Using the technology originally developed for beautiful visuals in film and video games helps us train networks at a fraction of the time it takes a traditional CPU.

What’s in a Name?

The power of Neural Networks may be evident, but at this point, one may also be wondering what they are in the first place. The name gives some clues. A Neural Network isn’t a singular object that makes decisions by itself. It is, as implied, a network of smaller objects all connected. A network of what, then? We call them Neurons. Neurons as in the cells inside our brains? Yes! Well, no. But sort of!

Neurons in Neural Networks can be thought of as being like the neurons in brains. They are tiny, individual units that are connected to other neurons in a large, structured network. These connections allow tiny pieces of data to flow between them. In our brains, these are electrical pulses. In the Neural Network, we send numbers between Neurons. The Neurons then take all the numbers fed to them by the Neurons they are connected to and process them. The process isn’t complicated—in fact, it is painfully trivial. After all, it is just a tiny unit—a single cell in our brain. But it then sends the processed information out to other Neurons it is connected to. Another number. Another electrical pulse. And so on. Tiny Neurons are fed tiny pieces of information, perform tiny pieces of computation on that snippet, and feed it forward to other Neurons, which do the same over and over until we eventually reach a final set of Neurons—the output of which is our final result.

From the collective effort of many small individual units networked together in a perfectly calibrated balance, we can achieve enormous computational power. The Whole is greater than the Sum of its parts.

Balancing Act

If that all sounded magical and farfetched, then don’t worry—it is. How exactly are we supposed to arrange these Neurons in such a perfect balance that their combined minuscule computations lead to a machine recognizing human faces? We can’t. So how do we get this to work? Well, we are talking about Machine Learning after all. And Machine Learning is how we are going to solve this. We aren’t going to calibrate the Neurons to be in balance. They are going to calibrate themselves.

We aren’t going to calibrate the Neurons to be in balance. They are going to calibrate themselves.

In order to achieve this, we need training data. We need data that is labeled with the desired output we want from the machine. We can provide this data to the unconfigured Neural Network. It will process the data and likely output something nonsensical and useless. But this is fine. We can use a mathematical function called Error. This Error is just a measurement of how different our output was from our desired targets. Then, using Calculus,we can discover how much of that Error is caused by the calibration of each one of our Neurons! Using this information, we can then tune the Neurons slightly and repeat the process—feed data into the Network, observe the output, calculate the Error, and use Calculus to know how to tune the Neurons to make them more accurate. This process continues over time until we have converged to a calibrated Network. This process of using Error functions, Calculus, and tuning is what Machine Learning is.

Power at a Price

Although it is conceptually simple (at least in this explanation that ignores the details), it is very resource intensive. At the moment I am fine tuning a Neural Network on my own desktop to recognize Western Art styles. Although the dataset is only a few GB, it takes almost an hour to run a single iteration of the training. It will take nearly two days to complete the 50-iteration learning schedule I planned. And even then, I may need to schedule another one if the Network still needs to learn more! And I’m not running this on some dusty machine I dragged out of the aughts. This is an almost brand-new desktop powered with a 3.7 GHz AMD Rhyzen 8-core processor with 32GB of RAM available and virtually no other load on the machine.

Neural Networks are expensive to train. If you want to increase performance, you could pay for an expensive GPU, but that might set you back nearly a thousand dollars. Companies and research institutions may have the funds to throw at this problem, but individuals, small companies, and small research groups may not. Luckily, they don’t need to anymore.

Cloud services have opened access to remote processing to provide other computing options to those desiring to train a Neural Network. Don’t have an expensive rig? No worries, just rent one remotely from Google at a fraction of the price.

Neurally Networked World

Neural Networks are here and they aren’t leaving. New advancements and computing architectures are constantly being published. Convolutional Networks are good at processing images and Recurrent Networks can handle variable sized data, streaming data, or sequential data. Neural Networks are powerful, indeed—far more so than other solutions we have. But they don’t mimic what is really occurring in our brains. There are some deep flaws even in our most advanced networks. For instance, it is extremely easy to confuse a Neural Network. A Network designed to recognize stop signs can be fooled by a few well-placed stickers on a sign. There are deep safety concerns if these are going to be used in self-driving cars for example.

Neural Networks are deeply dependent on the data used to train them (just as we discussed in a previous article). They are also dependent on how we configure their output. Most networks are designed to give a decision. For object detection, the network must return what it thinks the input image is. But what happens when we feed the network “nothing,” such as a completely blank image or fuzzy static noise? The network is forced to return something so it “sees,” say, a dog in the empty space. This is nonsense. Why would it choose one object over another in these cases where a human would just refuse to rigidly define nonsense? As another example, if we slightly alter an image by inserting some imperceptibly small random noise, we can completely trick a Neural Network. Where it before correctly thought that the image was a butterfly, it now thinks the image is a truck, while a human sees no difference in the images. This is bad, and it reflects deep issues with Neural Networks as we construct them today.

Neural Networks are occupying a liminal space. They are simultaneously scarily powerful and laughably simple and ignorant. They can best the human masters and yet be duped by the smallest of changes. We won’t be seeing Neural Networks achieve human-like sentience any time soon, and they aren’t ready for deployment in many other types of systems. But they are already being used in ways that should cause alarm.

China is already using facial recognition technology to tag members of the Uighur ethnic minority group. Accurate voice recognition ensures individuals could potentially be tracked even if they are not near a camera by turning the phones in our pockets into monitoring devices. “Deep Fakes” are a growing type of video that can modify existing videos to map one person’s face and voice over another’s to create a fake video that could be used for blackmail or disinformation. While Terminator remains science-fantasy, armed military drones using neural networks for stabilization, navigation, targeting, and tactics would revolutionize armed conflicts in ways impossible to predict.

Though neural networks are being used to oppress in some places they are also being used to save in others. Medical facilities increasingly deploy network systems to detect ailments such as cancer or infectious diseases. Laboratories can use similar networks for modeling complex biomolecules and developing treatments. Neural networks are even being used in traffic light control systems to increase vehicle flow and reduce accidents.

Where will Neural Networks take us next? It’s hard to say. But it seems evident that the world is caught and will remain in a web of Neurons.

The post What’s the News with Neural Networks? appeared first on Megaputer Intelligence.

Modeling with convolutional neural networks in PolyAnalyst

Josh Froelich — Thu, 10 Jan 2019 20:00:14 +0000

We are pleased to announce that PolyAnalyst now supports modeling data using convolutional neural networks. A convolutional neural network, also known as a ConvNet or CNN, is a particular kind of neural network. Understanding what a ConvNet does is simply a matter of understanding the general concept of a neural network, and then focusing on the distinct characteristics of the convolutional adjective.

Neural networks are one of the most popular methods for solving machine learning problems such as:

Classification – assigning data points to a hierarchy of classes, also known as labels. For example, determining who you might vote for based on where you live and your income, and then labeling you as a particular kind of voter.
Regression – measuring how a numerical value changes based on the values of other inputs, typically for predicting a future value. For example, determining how likely it is to rain tomorrow, or how likely you are to be a repeat customer.
Clustering – measuring the similarity of data points and attempting to group them. For example, identifying typical consumer markets based on purchasing behavior.

Convolutional networks are not that different from conventional neural networks. What sets ConvNets apart is their ability to process signal data, such as the pixels of an image, with remarkable efficiency. ConvNets are particularly well suited to finding patterns in noisy data, and tend to be more scalable than other neural network algorithms.

A brief history

Neural networks are certainly not new. A quick search will yield academic papers dating back to the 1940s. However, the convolutional flavor is newer, and has in recent years surged in popularity due to a renewed focus on deep learning.

Historically, ConvNets were used to process image data in an academic field known as computer vision¹. ConvNets are notable for how well they lend themselves to the task of recognizing features of images, such as people’s faces, or the continually-updated external view of the surrounding environment in a self-driving car. ConvNets are also useful when processing natural language (unstructured data), and for optical character recognition (OCR).

Applications of convolutional networks

Facebook uses convolutional neural networks to assist in automatically choosing how to tag Facebook posts.
Google uses convolutional neural networks to assist in providing accurate search results for Google Images.
Amazon uses convolutional neural networks to assist in recommending products to customers (cross-selling).
Pinterest uses convolutional neural networks to assist in personalizing your home feed.

What are the advantages of convolutional neural networks?

CNNs consist of convolutional layers, which we will touch on in more detail in a moment. CNNs allow you to extract features and build a multi-layered hierarchical structure from them. In this structure, features at a higher level are derived from features at a lower level.

Convolutional layers have a smaller number of weights than fully connected layers. This increases the efficiency of training multilayer networks.

While this approach is suitable for pattern recognition tasks, it can also be applied to numerical data.

What is convolution?

Convolutional neural networks are so named because of a data processing step known as convolution. A thorough mathematical treatise on what is convolution is outside the scope of this brief article. It is sufficient to emphasize that convolution is good at extracting features from data, and more so in its aggregation and filtering of those features, so that the number of features is reduced (a.k.a. pooling). This built in feature reduction is what makes such networks so adept at analyzing feature-rich data.

What is feature reduction?

Let’s take a step back from the technical jargon. We are going to aim for intuition in our explanation.

Feature reduction is like drawing a sketch. Imagine laying a thin piece of paper over an image, so that in a certain light, only some of the features of the image bleed through the paper. Take a pen and outline some of the lines. Then, discard the original image, and examine your sketch. The drawing contains substantially less information about the image. The original color is gone. Only some lines remain. In academia, this might be known as the application of an edge detection algorithm. Each line you have drawn is a recognized edge. As easy as it was for you, getting a computer to do this is not the easiest feat.

One of the notable characteristics of convolutional neural networks is that this edge detection step can be baked into the modeling step itself. In other approaches to machine learning, the steps of preparing the data for analysis are typically separated from the learning step. Here, a part or all of the cleaning step is merged with the learning step.

The first layers of the network serve to abstract away some of the features of the input data, and the later layers then continue processing these abstracted features as a substitute for the original. Because real world data may have thousands of useful features that could serve as decision-making criteria, it would ordinarily be cost-prohibitive to efficiently model the original data. This step of reducing the number of features via convolution and pooling in a way that increases modeling efficiency is why convolutional neural networks are so attractive.

We should point out again that this is a simplification. Real networks may involve multiple layers of feature extraction, and multiple layers of pooling. Feature refinement is an iterative process. Each time the network examines the data in its training phase, there is an opportunity to make adjustments.

Juggling the criteria for decision-making

There are trade-offs in reducing features. If the algorithm excises too many of the original features, the result is decreased fidelity.

If you recall the childhood game of Telephone, where children sit in a circle and starting with one child pass a whispered message from ear to ear, the final message is often hilariously different than the original. Each repeated utterance of the original message involves a loss of information. At some point the loss accumulates and the message changes.

One of the benefits of coupling the feature extraction step with the later layers is that the network can reconfigure itself as it memorizes new patterns. For example, if the network is doing a bad job of predicting a value, as measured by a high error rate, then the network can perform additional training that starts to consider additional features, or consider the same features in greater detail, or stop considering certain features, or view things from a different angle.

For solving the task of feature extraction, the addition of more data points to consider is not always a good thing. As more features get introduced, this can greatly increase learning time, and sometimes lead to a loss in accuracy due to an overabundance of features, similar to the difficulty of picking a person out of a crowd as the crowd size increases.

Step back and think about how good your ears are at filtering out noise from a crowded room of people to listen to one person speaking, or how adept your eyes are at filtering out information from your peripheral vision. The problem for machine learning is that machines see and hear all data indiscriminately. Machines do not come with this innate human ability to ignore objects in your visual periphery, or tune out annoying sounds and static. Machines struggle to distinguish between signal and noise.

This balancing of what data to consider and what to ignore is a juggling act. It requires some finesse. Perhaps even some subjectivity. Convolutional neural networks offer several dials to turn to achieve that. The networks mentioned in the earlier examples are highly refined.

Learning from mistakes: how neural networks evolve to make better predictions

Neural network algorithms are especially susceptible to a modeling problem of settling on a false minimum in the error calculation.

Imagine climbing halfway up Mt. Everest and then claiming you reached the summit simply because you could not see the top of the mountain from your vantage point.

Neural networks learn by minimizing error. This is similar to classical behavioral conditioning, like Pavlov’s dogs responding to the sound of a bell. A network will produce a guess, a prediction, of what a value should be, then measure the error, and then get rewarded or punished. If it gets zapped for picking a bad value, it measures how far off it was from the correct value, and then rewires its neurons so as to produce a guess that is hopefully closer to the desired value. Enough zaps, and its answer becomes more accurate (assuming that the data at least exhibits some semblance of a pattern).

However, networks are prone to settling on what a network thinks is the most accurate guess when in fact it is not. The algorithm cannot know in any given training iteration that it is finally accounted for all decision-making criteria. Instead, if the algorithm sees that after several attempts to tweak its criteria it fails to produce appreciable impact on reducing error, it gives up. It is often the case it has only climbed part of the way up or down the mountain.

The point is that no algorithm is a silver bullet. There is no one-size-fits-all machine learning approach. Be open to experimenting with other modeling approaches such as decision trees or regression.

For the most part, the other steps of any ConvNet algorithm are highly similar to a traditional neural network, so we will not be going into much more detail.

PolyAnalyst’s new Convolutional Neural Network node

PolyAnalyst recently introduced dedicated functionality for training a convolutional neural network model. The new node provides several options to configure. While many software packages that provide similar functionality tend to require an analyst to explicitly configure the network’s structure, PolyAnalyst’s implementation enables automatic structure building. In particular, PolyAnalyst’s implementation can filter and normalize the data, form a validation sample to control overfitting, train the network with the selected architecture, split the sample into training and testing batches, and ultimately select the best model.

An example of using the Convolutional Neural Network within PolyAnalyst

Speaking to readers familiar with PolyAnalyst, you can use the new node just like you would use any of PolyAnalyst’s many other modeling nodes such as classification and regression.

It is important to prepare the data prior to training. Once the data is in a suitable form, you will need to specify the dependent variable and the independent variables. In the following data analysis scenario, we see the input data split into a training subset and a testing subset. The model is trained on the training subset, and then tested against the testing subset.

The following screenshot communicates a portion of the report that enables you to assess how well the model performed. Here, you can see how the model learned patterns in the data in an iterative fashion. This particular chart shows how the prediction error, that is what the model thinks the predicted value should be compared to what it actually is, decreases over time (with increasing number of training epochs) as the model observes more patterns and performs retraining procedures.

This error chart can be useful for adjusting training parameters in future runs of the model.

The presence of such a solution also allows:

supporting different architectures and types of neural networks,
processing new types of data such as rasterized images, and
better processing of data representing a series of events.

If you are interested in learning more about PolyAnalyst, or viewing a free, personalized demonstration of the convolutional neural network, please contact us.

The post Modeling with convolutional neural networks in PolyAnalyst appeared first on Megaputer Intelligence.

Introduction to machine learning

Chris Farris — Fri, 10 Aug 2018 02:47:03 +0000

Data analyst Chris Farris takes us through the idea behind machine learning and how this segment of AI is affecting business and our personal lives. Hear his insights for the future of machine learning that will impact life as we know it for the next 50 years.

The post Introduction to machine learning appeared first on Megaputer Intelligence.

Defining text & data analytics

Brian Howard — Tue, 17 Jul 2018 00:10:31 +0000

Brian Howard, Sales & Marketing Manager, talks about the differences between text and data analytics and how we use these insights for better business decisions.

The post Defining text & data analytics appeared first on Megaputer Intelligence.

Fuzzy matching – Comparing records with string distance measures

Chris Farris — Thu, 24 May 2018 19:17:22 +0000

In previous posts we have loosely discussed the act of comparing strings. Until now this has been a nebulous term relying primarily on intuition to understand but if we want automated systems such as PolyAnalyst to perform what we call comparison it needs to be well defined and based on calculation. This concept is typically referred to as a string metric and is a method of putting into numbers how similar two strings are.

Despite the name, many of these measurements are not in fact metrics by definition because they do not all obey the triangle inequality; however, they remain very useful in the measurement of strings. There are many string metrics of varying degree of complexity and usefulness.

A traditional exemplar of string metrics is the Levenshtein distance.

What is the Levenshtein Distance?

Informally, the Levenshtein distance is the number of edits it takes to turn one string into the other while defining an edit as inserting characters, deleting characters, or substituting characters.

For example, let us compare the words “Kitten” with “Sitting.”

Kitten	Sitting
Sitten	Sitting
Sittin	Sitting
Sitting	Sitting

It took three operations to turn “Kitten” into “Sitting” so we can say the Levenshtein distance between the words is 3.

Other metrics

Other distance metrics include Damerau-Levenshtein that also takes into account transpositions of characters and Jaro-Winkler which considers matching characters and transpositions between strings but adds more complexity in both the definition and calculation.

Basic string metrics do not account for any semantic information about strings, however. They deal solely with characters and measure simply how close in terms of characters are strings alike. This is why, without normalization, string metrics are virtually useless. However, if we normalize our data into small atomic attributes, then we have already isolated the semantic components and so then relying on simple character distances actually provides some useful information to us.

String metrics are not the complete solution but they are an important piece in a system which can achieve fuzzy matching – piece that allows us to quantify how similar entity components are which allows us to automate this process and remove some degree of arbitrariness in an already subjective task.

The post Fuzzy matching – Comparing records with string distance measures appeared first on Megaputer Intelligence.

Pulse neural networks and microchips

Michael Kiselev — Tue, 20 Jun 2017 14:54:32 +0000

The following is a video of Megaputer’s own Dr. Michael Kiselev talking about pulse neural networks and microchips. This video is only available in Russian.

The post Pulse neural networks and microchips appeared first on Megaputer Intelligence.

A brief look at Megaputer’s PolyAnalyst

Rebecca Hale — Fri, 06 Nov 2015 05:00:36 +0000

A brief experience of Megaputer’s PolyAnalyst software. Watch a simple workflow on demonstration data going from data loading to visualization using a few of the many ‘nodes’ accessible for data and text analysis. The video is about 10 minutes in length.

The post A brief look at Megaputer’s PolyAnalyst appeared first on Megaputer Intelligence.