Machine Learning – Megaputer Intelligence

Query languages—the Swiss army knife of information extraction

Echo Lu — Tue, 06 Feb 2024 05:19:41 +0000

Text mining, the art of extracting information from text, requires the formulation of efficient queries that retrieve information based on user input. To do this, the user requires a language for writing queries. For the most basic use cases, the language operators could be regex or string search. But while regex and string search are indispensable for text mining, their utility hits a hard ceiling when semantic or meaningful search is required. They cannot, for example, capture complex entities such as human names, corporations, and drugs. For handling tasks like these, we need a more powerful query language that has semantic understanding, such as Megaputer’s PDL.

So, what is PDL, and how does it achieve semantic understanding while regex and string search do not? Let’s take a look at an example to find out.

First of all, PDL does not just search for the literal form of the word in the query: instead, it automatically extends its search to all morphological forms of the word. For example, when searching for the word “company” in financial news articles, the PDL query will not only find “company”, but also “companies,” the plural form. This feature often comes in handy, especially when the search involves a verb. Suppose that you are interested in extracting what the CEOs said. With regex or other substring search, you will need to list all possible verb forms such as “say,” “saying,” “says,” and “said.” With PDL, simply entering “say” in the query will automatically fetch all possible verb forms. This behavior can also be turned off by enclosing the word in the form function, which will then restrict the search to the literal form of the word, such as in the example below.

Another notable feature of the PDL language is its capability for users to tailor the scope of their searches using a range of built-in functions. Returning to the previous example, you may not wish to confine your search exclusively to the specific verb “say,” but rather include other synonymous verbs like “tell” or “mention.” Achieving this is straightforward with PDL – users can invoke the synonym function with the verb “say,” as demonstrated in (a) below. As the subsequent results table (b) illustrates, the captured text now includes various speech verbs such as “tell,” “emphasize,” and “claim,” in addition to the word “say,” capturing them in all possible verb forms. For additional flexibility, the user can also create and modify synonym dictionaries.

The PDL language offers various modes of information extraction, including proximity search (e.g., finding words A and B within a sentence, or within a 3-words range), syntactic relation (e.g., finding word A that is the subject or object of B), semantic relation (e.g., finding words that are synonyms/antonyms to word A), access to dictionaries and ontologies, and more. This language is expressive enough to capture complex patterns, and yet relatively easy to use, having a syntax that closely resembles English. Having access to this versatile query language significantly enhances the power and quality of text mining operations.

In conclusion, PDL is a powerful and versatile query language that enables users to extract meaningful information from text with greater efficiency and accuracy than competing methods like regex or string search. Its ability to understand and capture morphological forms, synonyms, and other complex patterns makes it an indispensable tool for solving text mining tasks that require semantic understanding. By leveraging the capabilities of PDL, users can enhance their information extraction processes and gain valuable insights from their data, making it a true Swiss army knife of information extraction.

The post Query languages—the Swiss army knife of information extraction appeared first on Megaputer Intelligence.

5 Things to Know about Linear Regression

Chris Farris — Wed, 29 Apr 2020 18:32:43 +0000

Before we get into talking about linear regression, you may recall that we recently have discussed advanced machine learning techniques such as neural networks and support vector machines, but these are not always the most appropriate tool for modeling data. Machine Learning models are big, complicated, and almost impossible to interpret. While they have great capacity and are sometimes the only solution to difficult problems, their downsides can be substantial depending on what our goals are. Additionally, it is often the case that we want to understand our models or have some measures of how valid they are.

For example, the goal of a physicist may be to understand their model of a particle interaction. If they create a neural network to do this, we may have low error in the system but our human understanding of the laws of nature is no better. Likewise, an economist may be more concerned with understanding the general relationship between political uncertainty in elections, uncertainty in public policy, and uncertainty in financial markets, than creating a complex interactive system that prevents generalizations that could be applied to governance, electioneering, or market trading. So, for the time being, let us return from the vast jungles of machine learning to the tamed and comfortable community of traditional statistical models, the first of which will be linear regression.

What is Linear Regression?

Linear regression is one of the simplest statistical models, but don’t let that fool you. It is a powerful tool because of that simplicity. Understanding linear regression starts with the name. A regression analysis is simply a method of estimating the relationship between a dependent variable and a set of independent variables. For example, given a set of data about childhood measurements like height, weight, gender, and so on, can we estimate the relationship between those variables and the dependent variable, which is the child’s height as an adult?

Why is it called “regression?” Mostly from tradition. The term originated many centuries ago and has stuck. You may have heard of the concept of “regression to the mean.” The concept is that data may divert from an expected “average” value, but over multiple observations, it will revert or “regress” to this expected value. For example, a tall person is likely to have children that are also tall, but probably less so. They will likely “regress” towards the average height rather than being taller than their parent.

Note that regression is modeling a numeric relationship. The output of the regression is a number. This is different from something like categorization, which outputs a class or label for the input data. The second part of the name is linear, referring to the fact that in linear Regression, we only care about linear relationships; our model is just going to be the sum of things.

Let’s consider a simple example. Using the traditional Iris set of data, examine the relationship between the length of a flower’s sepal to the length of its petal for the Virginica variety. A Linear Regression would attempt to model the interaction between petal length and sepal length as a line of best fit.

Virginica Iris

Wildflowerfarm.com

From the chart below, we can see there is a general linear trend in the data. As sepal length increases, petal length increases in a similar manner. Linear regression produces a linear equation which is the model of the system.

We can see the result of that regression plotted along with the data. The line is our model’s prediction of petal length for each sepal length. Of course, there will be some error because there is some variability in the data, but the fit generally describes the relationship well. The equation for this relationship is:

This is a simple, interpretable form that is very useful for analysis and understanding. From the equation, we can say that for each additional centimeter of sepal length, we generally observe around 0.75 centimeters of additional petal length in a Virginica Iris.

How is a Linear Model Calculated?

By now you probably have guessed the general form of a linear regression. Our linear model is:

Our estimate for the target variable is expressed as a bias term or intercept plus a weighted sum of our input variables. This a nice linear form. But given some data, how can we create a such a model it? There are an infinite number of lines to choose from, so which is the best? We need a way to measure how good our model is. This is where the concept of error or “residuals” comes into play. For each data point, we want to measure how far from our prediction the actual data was. There are several methods for this, the most common of which is called Residual Sum of Squares (RSS). RSS is the sum of the squared residuals. A residual is the difference between the real value for our target variable and the predicted value. It is often asked why we don’t call these terms “errors” instead of “residuals”, particularly since the term “residual” seems more confusing and an excuse to use a fancy math term to sound important. Unfortunately, a full answer and explanation is beyond the scope of this article, but in short, the reason is that this difference is not actually an “error.” When creating a model, we are fully aware that there is some variance in the distribution of the target variable. The full linear model equation includes a noise term, which is a Gaussian of mean 0, with some standard deviation that is estimated by the model. Thus, “residuals” are measuring this noise, which is what remains after removing the deterministic aspect of our model. So, our RSS is:

Now that we have a measure of how our model performs, we can compare lines to choose the best one. RSS is a measure of how much variability there is in distance from our line. The best line would have the least RSS measure. Thus, to choose the line of best fit we must find this minimizing line. This sounds like a difficult task. How would we be able to find that line? We can’t test each line one by one because there are infinite lines to consider! The answer is we use a method called Ordinary Least Squares. Now at this point, we will simply wave our hands and assure you that calculus and some linear algebra will solve this problem for us algorithmically and we don’t need to worry about it.

How to Assess a Linear Model

We’ve seen how we can calculate the terms of a linear model using minimization of a residual equation, but this is not the end of the model building process. Unlike machine learning models, which have millions of parameters obscured in a network, each parameter of a linear regression model can be interrogated and analyzed. The regression model produces several metrics for use. First, there is the R² measure or the Coefficient of Determination. R² measures how much of the variance in the data is explained by the model. A value of 1 means that the model perfectly describes the data while a value of .5 suggests that only 50% of the variance across the data is explained in the model. The rest is unaccounted for. Obviously, the closer to 1 the R² value, the better.

Additionally, for each coefficient term in our model, we can calculate a p-value to determine if that estimation is statistically significant. If we encounter large p-values, we may question whether the inclusion of that variable into the model is a good idea. In those cases, the variable may not be useful and we can retry the model after discarding it. One technique is to gradually build a model by adding variables one at a time and testing to see if there is a statistically significant increase in model performance by the addition of the next variable. This technique is called AnoVa. Another method for testing our model is by analyzing the distribution of the residuals. Ideally, they should have a normal distribution and be homoscedastic; that is, they maintain roughly the same variance across the fitted values of the model and across the inputs to the model. We want the residuals to behave the same across our data because that indicates regular performance. We don’t want the model to suddenly be worse for some areas or be unpredictable.

While all this analysis is again beyond the scope presented here, it is important to understand that because of the simplicity of the linear regression model, these types of robust, interpretable, and well understood statistical analyses are possible. We can be intimately familiar with our linear models in ways that are impossible for more complex structures.

What Flexibility Exists in Linear Models?

The biggest weakness of linear regression is that it is only possible to model direct linear relationships. But there are many more types of relationships in the world that we would like to model. Luckily, there are very easy ways to inject non-linearity into linear regression. The first of which is transformations on our data. Instead of modeling our variables directly, it is very common to transform them first such as taking a logarithm of the values or some other Power transform. There may not be a linear relationship between our values directly, but there could be one between their logarithms. A model of this kind can be interpreted as saying an increase in the dependent variable by 1% implies an increase in the target variable by X%.

Another technique is to include squared terms of our variables into the model as well. We simply square our data and treat that square as another separate variable. Yet another technique is to use what are called interactions. These are treated as combinations of existing variables. For example, if we include Sex and Height into our model, we can include a term, Sex:Height, which is just the combination of the two. Essentially, this creates two new variables called Male:Height, which is the same as Height for males and 0 otherwise, and likewise for Female:Height. This interaction term can model the separate effects of Height for Males and Height for Females. Perhaps the Height term has more dramatic effects for Females than Males?

The Power of Linear Regression Models

Linear regression models are extremely popular across domains because of their robustness and simplicity. Because of transformation and other techniques, linear regression models can model a wide range of dependencies in our data. Because their form is well defined, unlike Neural Networks, they have statistical properties that we can analyze to compare models, make interpretations, and derive important information. Linear regression is not only useful for predictions, it can do what most machine learning models cannot – describe the system. If you have a numerical value you want to model, a relatively short list of independent variables to use, and a desire to understand the model you create, linear regression should usually be your first choice. Luckily, if you use PolyAnalyst, you have all of those options available to test and find out what works best for your data.

The post 5 Things to Know about Linear Regression appeared first on Megaputer Intelligence.

Combining the Best of Both Worlds: Machine Learning & Rule-Based Approach

Elli Bourlai — Wed, 12 Feb 2020 15:15:51 +0000

In the past few years, more and more organizations have started relying solely on Machine Learning (also called Artificial Intelligence or AI) for addressing Text Analytics tasks. The reasons are that this approach is faster, suitable for large volumes of data, and more adaptable to new contexts, as it does not require hard-coded rules. However, AI / Machine Learning algorithms are only as good as the data they are trained on.

A good training dataset usually requires expert knowledge for pre-categorizing or “tagging” the input features for the model, which is often performed manually. These training datasets are also known as “gold standard data” and are not easy to obtain: it requires time and many trained people to produce an accurately tagged dataset, which can be very expensive. This is why many machine learning solutions use a statistical approach for identifying and categorizing input features for training, often resulting in solutions with high recall but low precision.

What about the Rule-Based approach?

Even though it has been abandoned lately in favor of machine learning solutions, it offers certain advantages over the latter method. Since the rules are created by humans, the results are consistent and more accurate. It is also easier to understand the logic behind the machine’s decisions and improve a model, instead of trying to comprehend a “black box”, which is the case with most machine learning algorithms. But the main issue with this approach is that the rules are hard-coded and are limited to the existing context, making it hard to adapt to new context; this results in higher precision, but lower recall.

Ideally, we would like to get the best of both worlds: the aspects of a Rule-Based approach that result in high precision and the aspects from Machine Learning that result in high recall. So how can we combine them efficiently?

Making Gold from Rules

The solution proposed by Megaputer involves using a rule-based approach to generate training datasets. This approach works by utilizing a highly accurate query language that removes the need for manual tagging, while still achieving similar results. PolyAnalyst’s Pattern Definition Language (PDL) navigates and leverages the linguistic properties of text; it allows an expert to teach a machine how to identify and categorize important features that are necessary for generating training datasets with “gold standard” quality.

Once the machine is given expert guidance with rules and has generated the pre-categorized training data, it can expand its knowledge to additional or new contexts through Machine Learning.

A Step Forward

Thus, with a combined approach, we achieve both the high precision that Rule-Based approaches bestow on created training datasets, along with the high recall that is achieved through Machine Learning in different contexts. If you are interested in automating the generation of accurate training data to improve the results of your Machine Learning models, contact us and learn about the tools we have available.

The post Combining the Best of Both Worlds: Machine Learning & Rule-Based Approach appeared first on Megaputer Intelligence.

Choosing Machine Learning Models

Chris Farris — Mon, 21 Oct 2019 21:27:31 +0000

As we have discussed previously, machine learning approaches to modeling are just that – approaches. There are many forms of models we could use with machine learning, each with different design philosophies and quirks. When setting out to use machine learning to create your own models, a question you may be asking yourself is: Which model framework do I choose? From Neural Networks, to Support Vector Machines, to Decision Trees, to Linear Regression, there are many options. While there is rarely a definitive or clear answer to this question, let’s discuss some things to consider when making this choice.

Model Properties

Model frameworks have a set of qualitative measures attached to them which, while difficult to directly measure, can be useful for thinking of. We will consider two qualities: capacity and complexity. Capacity is the measure of a model framework’s ability to describe data distributions; it can be thought of as potential accuracy. For example, imagine we scatter pebbles on the ground and would like to model the shape they form with two different models. One model is a rigid stick (depicted by the blue line) and the other is an elastic band (depicted by the red line).

The stick has very low capacity because it can only model one shape while changing the angle that it lies on the ground. A few pebbles may fit into this model, but it is unlikely to be a good choice for many situations. The elastic band, while probably not being perfect, can twist and bend to create a host of complex curves to fit our data. This model can fit more potential data distributions and has a higher capacity.

Capacity is a very important property of models. If our model’s capacity is too low, we may never be able to adequately make predictions no matter how much training data we use or how fancy our hardware is. Models with high capacity include Neural Networks and Support Vector Machines (SVMs), and this is why they have been so popular.

However, capacity is not the only property to consider. Another quality is complexity, which can also be thought of as interpretability or how easy it is for a human to understand. In our previous example, the stick had low complexity (or high interpretability). It can be described and understood well by humans, and it is highly generalizable. Low complexity models include Linear Regression and Decision Trees. Conversely, the many curves created by the elastic band are very complex and difficult for humans to describe. SVMs and Neural Networks would, therefore, be considered high complexity models.

This relationship between complexity and capacity that we saw in the above example is generally true for all of our models. By increasing our capacity, we often must incur the cost of increased complexity. This is part of the analysis you must consider when choosing a model. Do you need to be able to describe or personally understand what the model is doing? If so, going all in for the most complex but accurate model may not be desirable. This is why many hospitals and healthcare-related institutions often rely on less complex model structures like Random Forests. When taking the health of patients into account, doctors want to be able to understand what a model is doing before taking action on those results.

Resources

Thus far we have discussed the theoretical constraints of a trained model. Now we should consider more practical properties. A machine learning model requires training, which in turn requires data, processing power, and time. When selecting a machine learning model to use, we need to weigh how much of these resources we have available to invest in the model. Additionally, the quantification of the model itself is important. For example, if we plan to deploy the model on mobile technology, we should be cautious about how much storage the model itself requires. As expected, if we desire models with high capacity, we usually need to invest more resources into the model during training and storage. Let’s consider a sample of models and discuss the resources they require.

On the low-cost end of the spectrum are Naïve Bayes models. These models are generally lower in Capacity but are extremely easy to interpret and train. One of the main advantages of Naïve Bayes models is that they are exceedingly fast to train and store. Unfortunately, the model relies on the Naïve Bayes assumption which is rarely true for our data. However, if your data does not stray too far from that assumption, you can find that the model provides decent results in a cheap package.

Also relatively low-cost are Decision Trees and Random Forests. These models are interpretable and the tree structure of the models make them easy to store efficiently if you don’t have an enormous number of features. Unfortunately, we start to see an increase in the time required to train for these models.

SVMs, a popular technique, come with a large jump in resources required. Training these usually requires more data than other methods and the resulting models are bulky, almost impossible to interpret into plain English, and tricky to train in the first place because of the choice of kernel. However, SVM’s are powerful models when trained correctly, and thus they are widely used.

Possibly the most costly of the models are Neural Networks. Famed for their high capacity, these models require extensive processing and time to train and can take up a large chunk of storage to contain. But it isn’t all doom and gloom about the resources required for Neural Networks. Because of how they are trained, Neural Networks can take additional data to finetune the training at any point, meaning that we don’t need to retrain our models every time we get new data.

Which Model to Pick?

Ultimately, the answer is not obvious in every case. When choosing which machine learning model to use, we need to consider many different factors: What tradeoff do we want to make between capacity and complexity? What resources do we have available for our training? Do we want to be able to store the model in a compact manner? Do we want to be able to continually use new data without having to retrain the model entirely each time? Finally, we don’t always make the right choice. Sometimes we have to experiment with multiple structures to find one that fits our needs. If you can, train multiple models to help decide which one to devote your attention to.

The post Choosing Machine Learning Models appeared first on Megaputer Intelligence.

What’s the News with Neural Networks?

Chris Farris — Tue, 28 May 2019 18:10:03 +0000

There are many reasons for the explosion of machine learning advancements over the past decade. We now have vastly improved hardware for fast computation, and memory is cheaper than ever. Data is now “Big Data,” and it is both jealously hoarded and publicly available in repositories such as ImageNet. Individually, these advancements are already a blessing for the technology-space. But for artificial intelligence (AI), they have opened the gates for something truly powerful—Neural Networks.

Neural Whatnow?

Neural Networks. You’ve probably heard of them. They are at the forefront of the machine learning craze and are the driver of many of the most impressive advancements. The technology that led machines to the best humans in Go and the popular video game Starcraft? Neural Networks. The backbone of algorithms that can recognize images and faces, which are igniting a surveillance and privacy panic? Neural Networks.

And yet, Neural Networks aren’t some new idea spawned from the incubator of a giant tech company. They aren’t some stroke of genius from a college student turned dropout who went on to found a revolutionary tech firm. Neural Networks are in fact… old hat. Or they were.

The concept of a Neural Network (something we will get to later) has been around for years. They date back to the 1970s, and simpler versions of them existed even in the 1940s! So, if they have existed for decades, why are they only popular now?

The answer is related to the hardware and data advancements mentioned earlier. Neural Networks crunch a lot of numbers. They also need a lot of data to help them learn. Up until the past decade, this made training anything but the simplest networks highly time-consuming and expensive.

With all the great improvements to hardware over the years, the possibility of using more advanced Neural Networks became possible. Aided by hardware demands from the entertainment industry and now cryptominers, GPUs (graphical processing units) have been developed which can calculate specific mathematical operations at lightning speed. Luckily, these same kinds of operations occur in training neural networks. Using the technology originally developed for beautiful visuals in film and video games helps us train networks at a fraction of the time it takes a traditional CPU.

What’s in a Name?

The power of Neural Networks may be evident, but at this point, one may also be wondering what they are in the first place. The name gives some clues. A Neural Network isn’t a singular object that makes decisions by itself. It is, as implied, a network of smaller objects all connected. A network of what, then? We call them Neurons. Neurons as in the cells inside our brains? Yes! Well, no. But sort of!

Neurons in Neural Networks can be thought of as being like the neurons in brains. They are tiny, individual units that are connected to other neurons in a large, structured network. These connections allow tiny pieces of data to flow between them. In our brains, these are electrical pulses. In the Neural Network, we send numbers between Neurons. The Neurons then take all the numbers fed to them by the Neurons they are connected to and process them. The process isn’t complicated—in fact, it is painfully trivial. After all, it is just a tiny unit—a single cell in our brain. But it then sends the processed information out to other Neurons it is connected to. Another number. Another electrical pulse. And so on. Tiny Neurons are fed tiny pieces of information, perform tiny pieces of computation on that snippet, and feed it forward to other Neurons, which do the same over and over until we eventually reach a final set of Neurons—the output of which is our final result.

From the collective effort of many small individual units networked together in a perfectly calibrated balance, we can achieve enormous computational power. The Whole is greater than the Sum of its parts.

Balancing Act

If that all sounded magical and farfetched, then don’t worry—it is. How exactly are we supposed to arrange these Neurons in such a perfect balance that their combined minuscule computations lead to a machine recognizing human faces? We can’t. So how do we get this to work? Well, we are talking about Machine Learning after all. And Machine Learning is how we are going to solve this. We aren’t going to calibrate the Neurons to be in balance. They are going to calibrate themselves.

We aren’t going to calibrate the Neurons to be in balance. They are going to calibrate themselves.

In order to achieve this, we need training data. We need data that is labeled with the desired output we want from the machine. We can provide this data to the unconfigured Neural Network. It will process the data and likely output something nonsensical and useless. But this is fine. We can use a mathematical function called Error. This Error is just a measurement of how different our output was from our desired targets. Then, using Calculus,we can discover how much of that Error is caused by the calibration of each one of our Neurons! Using this information, we can then tune the Neurons slightly and repeat the process—feed data into the Network, observe the output, calculate the Error, and use Calculus to know how to tune the Neurons to make them more accurate. This process continues over time until we have converged to a calibrated Network. This process of using Error functions, Calculus, and tuning is what Machine Learning is.

Power at a Price

Although it is conceptually simple (at least in this explanation that ignores the details), it is very resource intensive. At the moment I am fine tuning a Neural Network on my own desktop to recognize Western Art styles. Although the dataset is only a few GB, it takes almost an hour to run a single iteration of the training. It will take nearly two days to complete the 50-iteration learning schedule I planned. And even then, I may need to schedule another one if the Network still needs to learn more! And I’m not running this on some dusty machine I dragged out of the aughts. This is an almost brand-new desktop powered with a 3.7 GHz AMD Rhyzen 8-core processor with 32GB of RAM available and virtually no other load on the machine.

Neural Networks are expensive to train. If you want to increase performance, you could pay for an expensive GPU, but that might set you back nearly a thousand dollars. Companies and research institutions may have the funds to throw at this problem, but individuals, small companies, and small research groups may not. Luckily, they don’t need to anymore.

Cloud services have opened access to remote processing to provide other computing options to those desiring to train a Neural Network. Don’t have an expensive rig? No worries, just rent one remotely from Google at a fraction of the price.

Neurally Networked World

Neural Networks are here and they aren’t leaving. New advancements and computing architectures are constantly being published. Convolutional Networks are good at processing images and Recurrent Networks can handle variable sized data, streaming data, or sequential data. Neural Networks are powerful, indeed—far more so than other solutions we have. But they don’t mimic what is really occurring in our brains. There are some deep flaws even in our most advanced networks. For instance, it is extremely easy to confuse a Neural Network. A Network designed to recognize stop signs can be fooled by a few well-placed stickers on a sign. There are deep safety concerns if these are going to be used in self-driving cars for example.

Neural Networks are deeply dependent on the data used to train them (just as we discussed in a previous article). They are also dependent on how we configure their output. Most networks are designed to give a decision. For object detection, the network must return what it thinks the input image is. But what happens when we feed the network “nothing,” such as a completely blank image or fuzzy static noise? The network is forced to return something so it “sees,” say, a dog in the empty space. This is nonsense. Why would it choose one object over another in these cases where a human would just refuse to rigidly define nonsense? As another example, if we slightly alter an image by inserting some imperceptibly small random noise, we can completely trick a Neural Network. Where it before correctly thought that the image was a butterfly, it now thinks the image is a truck, while a human sees no difference in the images. This is bad, and it reflects deep issues with Neural Networks as we construct them today.

Neural Networks are occupying a liminal space. They are simultaneously scarily powerful and laughably simple and ignorant. They can best the human masters and yet be duped by the smallest of changes. We won’t be seeing Neural Networks achieve human-like sentience any time soon, and they aren’t ready for deployment in many other types of systems. But they are already being used in ways that should cause alarm.

China is already using facial recognition technology to tag members of the Uighur ethnic minority group. Accurate voice recognition ensures individuals could potentially be tracked even if they are not near a camera by turning the phones in our pockets into monitoring devices. “Deep Fakes” are a growing type of video that can modify existing videos to map one person’s face and voice over another’s to create a fake video that could be used for blackmail or disinformation. While Terminator remains science-fantasy, armed military drones using neural networks for stabilization, navigation, targeting, and tactics would revolutionize armed conflicts in ways impossible to predict.

Though neural networks are being used to oppress in some places they are also being used to save in others. Medical facilities increasingly deploy network systems to detect ailments such as cancer or infectious diseases. Laboratories can use similar networks for modeling complex biomolecules and developing treatments. Neural networks are even being used in traffic light control systems to increase vehicle flow and reduce accidents.

Where will Neural Networks take us next? It’s hard to say. But it seems evident that the world is caught and will remain in a web of Neurons.

The post What’s the News with Neural Networks? appeared first on Megaputer Intelligence.

Comparing Machine Translation to Native Language Analysis

Jeff Palan — Fri, 10 May 2019 20:20:59 +0000

As our world becomes increasingly global, so does our data. Being an analyst working with almost any size company today often means facing the challenge of receiving text data that contains multiple languages. So what do you do?

Essentially, there are two options we may consider: machine translation or native language analysis.

With machine translation, we actually create a new dataset where the text has all been translated into a single language before we do the analysis. This makes the subsequent analysis much easier, as we only need to use a single language grammar module for the analysis.
Native language analysis means that we keep documents in their original languages and perform a separate analysis for each language with the corresponding grammar module.

To demonstrate how these options work, let’s imagine we are working with a dataset that has records (or rows) of textual responses that are either in English or Spanish. Regardless of whether we want to use machine translation or to process individual documents in their native languages, we first need to split the dataset up by language. In this example, all the English records are together in one dataset, and all the Spanish records are in another. We then use the software PolyAnalyst for Text™ to perform native language text analysis by implementing the corresponding language modules. Currently, this software can process the texts of 16 different languages, and it can integrate with third-party translation services such as Microsoft, SDL, and Google. Splitting the data can be easily accomplished by using a combination of the Language Detection and Filter Rows nodes in PolyAnalyst. The Language Detection node automatically samples the text of each record and determines what language is being used (as seen below). Once each record is tagged with its language, we can use these tags to filter out the data by language.

Translation services like Google and Microsoft are fully capable of identifying the language of texts on a record-by-record basis, but they will charge you even if you ask them to translate English to English.

PolyAnalyst workflow for detecting and filtering text data by language.

Translation services like Google and Microsoft are fully capable of identifying the language of texts on a record-by-record basis, but they will charge you a fee even if you ask them to translate English to English. Therefore, it is best to do the split beforehand and send them only the texts that really need translation. Some of you may even go so far as to split individual texts into multiple records when that text contains multiple languages. This will minimize translation costs.

Once we’ve separated our data based on the language, it’s time to decide what approach to use: Machine Translation (MT) or Native Language Analysis (NLA). Let’s review some pros and cons of each of these approaches.

Machine Translation Pros & Cons

PROS

Relatively Cheap
Machine translation is relatively cheap, even with a fairly large dataset. These services tend to charge by the character, so the cost will vary based on how many documents you have and how long (a.k.a. wordy) they are. For reference, at Microsoft’s lowest (and least cost-effective) pricing tier, the cost is $10 per 1 million characters. A page in Microsoft word is about 3000 characters, so you can translate about 333 pages for $10. If you have a huge dataset, then this might sound like a lot. However, compared to hiring multiple analysts with fluency in different languages, it may still be the more affordable option.
Simple and Accessible Results
The data is more widely accessible and ends up consolidated into a single language. This means that even if the data was originally in 10 different languages, end users such as the company or team managers, who may only speak English, can review the text of each supporting record for a proposed insight and see what is being said and why that record was processed that way.
Low Maintenance
Typically for ongoing analysis, you will want to update your analysis workflow periodically to account for new issues and trends. Because MT facilitates in creating a single analysis scenario for a single language, its maintenance becomes simpler. If you want to use NLA and have 10 languages present in the data, you will have to build and maintain a separate analysis scenario for each language.

CONS

Low Accuracy
The accuracy of machine translation is still relatively low compared to manual translation. And even with manual translation, some things like sarcasm and figures of speech may not translate well. For example, if you translated “break a leg” into another language, the meaning of “I wish you good luck” is likely to be lost. Additionally, different languages may be more or less difficult to translate to or from. Going from Spanish to English will likely result in a reasonable translation, but most translation services that have been tested by our analysts at Megaputer performed relatively poor when working with languages like Japanese, Chinese, and Korean. Anyone looking to use machine translation will need to run some tests to see if the accuracy is at a level that can meet the output goals.

Here is an example of poor translation from Google Translate. As you may know, Japanese is a highly contextual language, which makes machine translation difficult.

Original Text: 生懸命指でまぶたを広げて目薬を差しました。
Google Translate: I spread my eyelids with my fingers and put on my eyes.
Manual Translation: With great effort I held his eyelids open with my fingers and dropped in the eye medicine.

Garbage In, Garbage Out
The accuracy of the analysis will be partially dependent on the accuracy of the translation. A low accuracy translation will cause the results of the analysis to be less trustworthy.

Native Language Analysis Pros and Cons

PROS

Better Accuracy
Native language analysis generally results in much more accurate results. This is, of course, dependent on the analyst.
Traceability and Transparency
There is a 1-to-1 account of what parts of the original text match the search query, and this will be visible in the final results.

CONS

More Expensive
Native language analysis tends to be more expensive. There will be costs for hiring additional analysts to cover different languages. Those analysts will need to not only have the skills of an analyst, but also the skills of a polyglot linguist.
More Maintenance
In the future when models and algorithms need to be adjusted, the work will be multiplied by the number of languages being worked with.
Less Accessibility
When consumers of the results review the analysis, they may not be able to independently read all the records to understand the supporting information for suggested insights.

An introduction to machine learning

Chris Farris — Fri, 15 Feb 2019 18:58:52 +0000

Machine Learning is hot right now. Really hot. And it’s come a long way from where it was over two decades ago. In 1996, IBM’s Deep Blue defeated world chess champion Garry Kasparov. This was a great achievement to mark the progress of the field, but chess is a relatively simple task and computers still struggled to be able to master more difficult tasks for another decade. Then, in the late aughts, machine learning started to boom like never before. In 2011, IMB’s Watson utilized live simultaneous natural language processing with information retrieval to defeat two Jeopardy! champions. In 2016, Google’s AlphaGo defeated Lee Sedol, who is claimed by many to be the strongest Go player in history. Despite the simplicity of Go’s rules, the game is extraordinarily complex. For perspective, there are more possible configurations of a Go board than there are atoms in the universe—and by this measure, Go is a googol more complicated than chess. To really appreciate the magnitude of this, consider that a googol (10¹⁰⁰) is written as one followed by one hundred zeros, and the ratio of an electron (a sub-atomic particle) to the entire known universe is only 0.00000001% of a googol. It is a mind-boggling large number beyond human comprehension. And in a handful of years, machine learning has advanced by that scale.

Recently, a program called AlphaStar learned virtually on its own how to play and master the wildly popular and challenging competitive real-time strategy video game StarCraft II. StarCraft II is so competitive that human players actually perform physical exercises with their fingers so that their reflexes are honed in order to strike the right keys as fast as possible. While Go merely involves placing pebbles on a grid, StarCraft II involves economic resource management, strategic combined arms combat, exploration, extensive future planning, processing streams of real-time information, and making rapid decisions.

Applications of Machine Learning

Machine Learning often gets flashy coverage when it reaches milestones in games, but it has slowly crept into our daily lives in ways we may not think about. When we post pictures on social media platforms like Facebook, our faces are instantly recognized and names are suggested of who to tag. When we watch movies or shows on platforms like Netflix, we get an endless stream of recommendations based on our past behavior. This same structure exists on the online shopping market such as Amazon. And it’s common for Apple’s Siri, Amazon’s Alexa, and Google’s Assistant to live in our homes and understand our requests.

Machine Learning is here to stay. And with Machine Learning technology sprinting into the future, everyone wants in. Companies are rapidly creating or bolstering existing analytics departments to utilize the craze. Unfortunately, like many hyped technologies, “Machine Learning” as a phrase can devolve into jargon, buzz words, oversimplifications, misnomers, and a lot of confusion. To many, Machine Learning is a mystery that seems indifferentiable from sorcery.

What is Machine Learning?

Machine Learning, like Artificial Intelligence, suffers from a name that tends to get bogged down in philosophy and pedantry. What does it mean to “learn?” What does it mean to hold “intelligence?” There is a good deal of discussion amongst computer scientists and others about this, but for our purpose it is just a rabbit hole. In fact, it is best to throw out our human conceptions of what “learn” and “intelligence” mean because they only add a biased expectation.

So, then, what is Machine Learning? In it’s simplest form, Machine Learning is essentially making a computer learn a complicated task by having the computer teach itself. We merely provide the computer some examples of how to do something, and the computer learns from those examples to help us fill in the holes and solve complex problems.

Now let’s break this process down. First, it is useful to identify our goals. Almost always, we would like to classify (assign discrete labels), perform regression (predict numerical values) on, or cluster (group similar things together) our data. For example, we could classify faces by if they are smiling or not. We could perform regression on weather data to predict tomorrow’s temperature. We could cluster stars together by grouping them based on how hot and bright they are.

Now that we have a goal in mind, the question is by what method can we achieve this task? This is the “model.” Think of a model as a machine that you feed data into and which spits out classes, a regression, or clusters of that data. How do we build the model? Before Machine Learning, you would have to manually figure this out. This would involve having to draw specific blueprints for the machine, figure out the appropriate size of gears and their exact positions, assemble the machine, and test it.

In this analogy, Machine Learning is like building a special type of machine. This machine has gears that can change size and position. Also, if you provide the machine with the desired output, it can run data through itself and check the resulting output with this desired output. If they don’t match, the machine knows how to change the size and position of the gears to make it more likely to be correct in the future. And it can repeat this over and over again until it is has found the best arrangement of gears. This is essentially Machine Learning, and all it took was having a special device and labeled data it could look at. No human was needed to tell it where to place the gears.

This is just an analogy, but it works very similarly in our digital space. We construct a digital model that is parameterized by certain variables. We feed training data labeled by the desired output into the model and see the actual output. We then compare the actual output to the labels, which is our desired output. We note the error and use math to compute in what direction and by how much we should tweak the parameters of the model. And repeat. The machine doesn’t make decisions or figure out concepts on its own. It is programmed to use Calculus, Information Theory, Probability, and Statistics to calculate numbers to fudge the parameters by.

In the end, Machine Learning is not what most humans consider “learning.” It is driven by math, algorithms, and data. For those who aren’t fascinated by the math and theory of Machine Learning, this can destroy the mystique somewhat. Personally, although the Man Behind the Curtain is not what we might expect or desire, he is intriguing all the same. In fact, as computer scientists and cognitive scientists study Machine Learning more we start to suspect that there isn’t as much of a disconnect as we might suspect. We may think that Machine Learning is so unlike humanity, but it may be the case that humanity is more like Machine Learning than we suspect. Our own brains may be just like a cold math-algorithm-data amalgam, but on a massive and complex scales.

The Importance of Data

Data is a pillar of Machine Learning. Without data we cannot build anything. The math doesn’t change, and while we can build clever model structures to help with the learning process, without good data our models will fail. We luckily live in an age of data. Large amounts of data allow models to be extremely fine-tuned. It allows for advanced model structures like Deep Neural Networks. Publicly available data sets such as MNIST and ImageNet for image processing have allowed data scientists to easily compare models and share knowledge. Data is revolutionizing entire fields and Machine Learning is no exception.

But with great power comes great responsibility. If our data is not labeled well or if it is biased from how it is collected, that error or bias is directly injected into the model. After all, the model is simply trying to mimic the data used to create it. Bad data in, bad model out.

Biased data can lead to very poor model performance. For example, last year MIT Media Lab showed that facial recognition from Microsoft, IBM, and Face++ weren’t very good at identifying women or person’s with darker skin, most likely because they didn’t have enough examples of those types of faces in their training data.

So big data is great for our ability to model the world. But the purity and completeness of the data we use in Machine Learning should always be considered when building a model.

The Future

Well, Machine Learning is the future. Tasks that we do effortlessly as humans, such as processing sound and sight, are extraordinarily complicated. Without Machine Learning it would take monumental effort to create machines to do them effectively, and for the tasks that even we as humans find hard, machines would be hopeless. We live in the age of data and increasingly cheap computational power. Machine Learning is thriving and adapting. If you are interested in Machine Learning, check out this video for more insights.

And stay tuned for future blog posts on Machine Learning as we examine different applications and different models used.

The post An introduction to machine learning appeared first on Megaputer Intelligence.

Conversion from rate code to temporal code – Crucial role of inhibition

Michael Kiselev — Sat, 02 Jul 2016 15:14:03 +0000

This study is an attempt to answer the question – what kind of spiking neural networks could efficiently transform rate-coded input signal into temporally coded form – specific activity of neuronal groups with strictly fixed temporal delays between spikes emitted by different neurons in every group. Since theoretical approach to the solution of this problem appears to be very hard or impossible the network configurations performing this task efficiently were found by means of genetic algorithm. Exploration of their structure showed that while excitatory neurons form the groups with stereotypical firing patterns, the inhibitory neurons of the network make these patterns specific for different rate-coded stimuli and, thus, play the key role in conversion of rate-coded input signal to temporal code.

Cite this paper as: Kiselev M.V. (2016) Conversion from Rate Code to Temporal Code – Crucial Role of Inhibition. In: Cheng L., Liu Q., Ronzhin A. (eds) Advances in Neural Networks – ISNN 2016. ISNN 2016. Lecture Notes in Computer Science, vol 9719. Springer, Cham

Published as a part of the International Symposium on Neural Networks

Available from: https://link.springer.com/chapter/10.1007%2F978-3-319-40663-3_76

The post Conversion from rate code to temporal code – Crucial role of inhibition appeared first on Megaputer Intelligence.

Homogenous chaotic network serving as a rate/population code to temporal code converter

Michael Kiselev — Sun, 23 Mar 2014 15:21:13 +0000

Abstract: At present, it is obvious that different sections of nervous system utilize different methods for information coding. Primary afferent signals in most cases are represented in form of spike trains using a combination of rate coding and population coding while there are clear evidences that temporal coding is used in various regions of cortex. In the present paper, it is shown that conversion between these two coding schemes can be performed under certain conditions by a homogenous chaotic neural network. Interestingly, this effect can be achieved without network training and synaptic plasticity.

Author: Mikhail Kiselev

MLA Citation: “Homogenous chaotic network serving as a rate/population code to temporal code converter” Computational intelligence and neuroscience vol. 2014 (2014): 476580.

Download as PDF: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3980915/pdf/CIN2014-476580.pdf

Download from megaputer.com: https://www.megaputer.com/wp-content/uploads/kiselev-homogenous-chaotic-network-rate-temporal-converter.pdf

Copyright © 2014 Mikhail V. Kiselev. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The post Homogenous chaotic network serving as a rate/population code to temporal code converter appeared first on Megaputer Intelligence.

Self-organization process in large spiking neural networks leading to formation of working memory mechanism

Michael Kiselev — Tue, 01 Jan 2013 14:40:09 +0000

The paper is devoted to implementation and exploration of evolutionary development of the short-term memory mechanism in spiking neural networks (SNN) starting from initial chaotic state. Short-term memory is defined here as a network ability to store information about recent stimuli in form of specific neuron activity patterns. Stable appearance of this effect was demonstrated for so called stabilizing SNN, the network model proposed by the author. In order to show the desired evolutionary behavior the network should have a specific topology determined by “horizontal” layers and “vertical” columns.

To download this PDF please visit https://link.springer.com/chapter/10.1007/978-3-642-38679-4_51.

The post Self-organization process in large spiking neural networks leading to formation of working memory mechanism appeared first on Megaputer Intelligence.

Machine Learning – Megaputer Intelligence

Query languages—the Swiss army knife of information extraction

5 Things to Know about Linear Regression

What is Linear Regression?

How is a Linear Model Calculated?

How to Assess a Linear Model

What Flexibility Exists in Linear Models?

The Power of Linear Regression Models

Combining the Best of Both Worlds: Machine Learning & Rule-Based Approach

What about the Rule-Based approach?

Making Gold from Rules

A Step Forward

Choosing Machine Learning Models

Model Properties

Resources

Which Model to Pick?

What’s the News with Neural Networks?

Neural Whatnow?

What’s in a Name?

Balancing Act

Power at a Price

Neurally Networked World

Comparing Machine Translation to Native Language Analysis

Machine Translation Pros & Cons

PROS

CONS

Native Language Analysis Pros and Cons

PROS

CONS

Other Options and Common Questions

For Example

An introduction to machine learning

Conversion from rate code to temporal code – Crucial role of inhibition

Homogenous chaotic network serving as a rate/population code to temporal code converter

Self-organization process in large spiking neural networks leading to formation of working memory mechanism