The Data Science Lab

### Naive Bayes Classification using C#

Dr. James McCaffrey of Microsoft Research presents a complete step-by-step example with all the code to predict a person’s optimism score from their occupation, eye color, and country.

The goal of a naive Bayesian classification problem is to predict a discrete value called a class label. The idea is best explained with a concrete example. Suppose you have a set of 40 data items where each item consists of a person’s profession (actor, baker, clerk, or dishwasher), eye color (green or hazel), country (Italy, Japan or Korea) and his personality optimism score (0, 1 or 2). You want to predict a person’s optimism score based on their profession, eye color, and country. This is an example of multiclass classification because the variable to be predicted, optimism, has three or more possible values. If the variable to be predicted had only two possible values, the problem would be called binary classification. Unlike some machine learning techniques, Naive Bayes can handle both multiclass and binary classification problems.

A good way to see where this article is going is to take a look at the screenshot of a demo program in **Figure 1**. The demo sets up 40 data elements – profession, eye color, country, optimism. The first data elements are:

actor green korea 1 baker green italy 0 diver hazel japan 0 diver green japan 1 clerk hazel japan 2

Note that all data values are categorical (not numeric). This is a key feature of the naive Bayesian classification technique presented in this article. If you have numerical data, such as a person’s age in years, you can group the data into categories such as child, adolescent, adult.

The demo configures an item to predict as (“baker”, “hazelnut”, “italy”). Then the demo loops through the data and calculates and displays the smoothed joint counts (“add 1”). For example, the 5 in the screenshot means there are 4 bakers who have optimism class = 0.

The demo calculates the raw number of unsmoothed classes as (19, 14, 7). This means that there are 19 people with optimism class = 0, 14 people with class = 1, and 7 people with class = 2. Note that 19 + 14 + 7 = 40, the number of elements of data.

The smoothed joint counts and raw class counts are mathematically combined to produce proof terms of (0.0027, 0.0013, 0.0021). These correspond to the class likelihoods (0, 1, 2). Because the greatest evidentiary value is at the index [0]the prediction for the person (“baker”, “hazelnut”, “italy”) is class 0.

Since the proof terms are somewhat difficult to interpret, the demo converts the three proof terms into pseudo-probabilities: (0.4418, 0.2116, 0.3466). The values are not true mathematical probabilities, but because they add up to 1.0, they can be interpreted roughly as probabilities. The greatest probability is at the clue [0].

This article assumes you have intermediate or better programming skills with a C-family language such as Python or Java, but does not assume you know anything about naive Bayes classification. The complete demo code and associated data are shown in this article. Source code and data are also available in the accompanying download. All normal error checking has been removed to keep the main ideas as clear as possible.

**Understanding Naive Bayes Classification
**

The first step in naive Bayesian classification is to calculate the joint numbers associated with the data item to be predicted. For (“baker”, “hazel”, “italy”) and the 40-item demo data, the joint counts are:

baker and class 0 = 4 baker and class 1 = 0 baker and class 2 = 1 hazel and class 0 = 5 hazel and class 1 = 2 hazel and class 2 = 2 italy and class 0 = 1 italy and class 1 = 5 italy and class 2 = 1

Joint accounts are multiplied together when calculating terms of evidence. If a joint count is 0, the entire term is set to zero, so 1 is added to each joint count to avoid this. This is called Laplacian smoothing.

The next step in naive Bayesian classification is to calculate the number of each of the classes to be predicted. For the demo data, the number of classes is (19, 14, 7). There is no need to smooth the number of classes as there will always be at least one of each class.

Next, a proof term (sometimes called Z) is calculated for each possible class value. For class 0, the proof term is:

Z(0) = (5 / 19+3) * (6 / 19+3) * (2 / 19+3) * (19 / 40) = 4/22 * 5/22 * 1/22 * 19/40 = 0.1818 * 0.2273 * 0.0435 * 0.4750 = 0.0027

The last term, 19/40, is the probability of class 0. In the first three terms, the numerator is a smoothed joined number. The denominator is the raw number of classes with 3 (number of predictor values) added to compensate for smoothing.

For class 1, the proof term is:

Z(1) = (1 / 14+3) * (3 / 14+3) * (6 / 14+3) * (14 / 40) = 1/17 * 3/17 * 6/17 * 14/40 = 0.0588 * 0.1765 * 0.3529 * 0.3500 = 0.0013

And for class 2, the proof term is:

Z(2) = (2 / 7+3) * (3 / 7+3) * (2 / 7+3) * (7 / 40) = 2/10 * 3/10 * 2/10 * 7/40 = 0.2000 * 0.3000 * 0.2000 * 0.1750 = 0.0021

The largest proof term is associated with class 0, so it is the predicted class. To make proof terms easier to compare, they are often normalized so that their sum is 1.0. The sum of the proof terms is 0.0027 + 0.0013 + 0.0021 = 0.0061. The normalized (pseudo-probabilities) are:

P(class 0) = 0.0027 / 0.0061 = 0.4418 P(class 1) = 0.0013 / 0.0061 = 0.2116 P(class 2) = 0.0021 / 0.0061 = 0.3466

Since each proof term is the product of values less than 1, it is possible that the calculation will run into problems. Therefore, in practice, the calculation of proof terms uses the mathematical log() of each value which allows additions and subtractions instead of multiplications and divisions.

**The demo program
**

The complete demo program, with some minor modifications to save space, is shown in

**List 1**. To create the program, I launched Visual Studio and created a new C# .NET Core console application named NaiveBayes. I used Visual Studio 2019 (Free Community Edition), but the demo has no major dependencies, so any version of Visual Studio will work fine. You can also use the Visual Studio Code program.

Once the template code loaded, in the editor window, I removed all unnecessary namespace references, leaving only the reference to the top-level System namespace. In the Solution Explorer window, I right-clicked the Program.cs file, renamed it to the more descriptive NaiveBayesProgram.cs, and allowed Visual Studio to automatically rename the class Program in NaiveBayesProgram.

**List 1:**

Complete Naive Bayes demo code

using System; namespace NaiveBayes { class NaiveBayesProgram { static void Main(string[] args) { Console.WriteLine("nBegin naive Bayes classification "); string[][] data = GetData(); Console.WriteLine("nData looks like: "); Console.WriteLine("actor green korea 1"); Console.WriteLine("baker green italy 0"); Console.WriteLine("diver hazel japan 0"); Console.WriteLine("diver green japan 1"); Console.WriteLine("clerk hazel japan 2"); Console.WriteLine(" . . . "); int nx = 3; // number predictors (job, eye, country) int nc = 3; // number classes (0, 1, 2) int N = 40; // number data items int[][] jointCounts = new int[nx][]; for (int i = 0; i < nx; ++i) jointCounts[i] = new int[nc]; int[] yCounts = new int[nc]; string[] X = new string[] { "baker", "hazel", "italy"}; Console.WriteLine("nItem to classify: "); Console.WriteLine("baker hazel italy"); for (int i = 0; i < N; ++i) { // compute joint counts int y = int.Parse(data[i][nx]); // get the class as int ++yCounts[y]; for (int j = 0; j < nx; ++j) { if (data[i][j] == X[j]) ++jointCounts[j][y]; } } for (int i = 0; i < nx; ++i) // Laplacian smoothing for (int j = 0; j < nc; ++j) ++jointCounts[i][j]; Console.WriteLine("nJoint counts (smoothed): "); Console.WriteLine("0 1 2 "); Console.WriteLine("------"); ShowMatrix(jointCounts); Console.WriteLine("nClass counts (raw): "); ShowVector(yCounts); // compute evidence terms double[] eTerms = new double[nc]; for (int k = 0; k < nc; ++k) { double v = 1.0; // direct approach for (int j = 0; j < nx; ++j) v *= (jointCounts[j][k] * 1.0) / (yCounts[k] + nx); v *= (yCounts[k] * 1.0) / N; eTerms[k] = v; //double v = 0.0; // use logs to avoid underflow //for (int j = 0; j < nx; ++j) // v += Math.Log(jointCounts[j][k]) - Math.Log(yCounts[k] + nx); //v += Math.Log(yCounts[k]) - Math.Log(N); //eTerms[k] = Math.Exp(v); } Console.WriteLine("nEvidence terms for each class: "); ShowVector(eTerms); double evidence = 0.0; for (int k = 0; k < nc; ++k) evidence += eTerms[k]; double[] probs = new double[nc]; for (int k = 0; k < nc; ++k) probs[k] = eTerms[k] / evidence; Console.WriteLine("nPseudo-probabilities each class: "); ShowVector(probs); Console.WriteLine("nEnd naive Bayes demo "); Console.ReadLine(); } // Main static void ShowMatrix(int[][] m) { int r = m.Length; int c = m[0].Length; for (int i = 0; i < r; ++i) { for (int j = 0; j < c; ++j) { Console.Write(m[i][j] + " "); } Console.WriteLine(""); } } static void ShowVector(int[] v) { for (int i = 0; i < v.Length; ++i) Console.Write(v[i] + " "); Console.WriteLine(""); } static void ShowVector(double[] v) { for (int i = 0; i < v.Length; ++i) Console.Write(v[i].ToString("F4") + " "); Console.WriteLine(""); } static string[][] GetData() { string[][] m = new string[40][]; // 40 rows m[0] = new string[] { "actor", "green", "korea", "1" }; m[1] = new string[] { "baker", "green", "italy", "0" }; m[2] = new string[] { "diver", "hazel", "japan", "0" }; m[3] = new string[] { "diver", "green", "japan", "1" }; m[4] = new string[] { "clerk", "hazel", "japan", "2" }; m[5] = new string[] { "actor", "green", "japan", "1" }; m[6] = new string[] { "actor", "green", "japan", "0" }; m[7] = new string[] { "clerk", "green", "italy", "1" }; m[8] = new string[] { "clerk", "green", "italy", "2" }; m[9] = new string[] { "diver", "green", "japan", "1" }; m[10] = new string[] { "diver", "green", "japan", "0" }; m[11] = new string[] { "diver", "green", "japan", "1" }; m[12] = new string[] { "diver", "green", "japan", "2" }; m[13] = new string[] { "clerk", "green", "italy", "1" }; m[14] = new string[] { "diver", "green", "japan", "1" }; m[15] = new string[] { "diver", "hazel", "japan", "0" }; m[16] = new string[] { "clerk", "green", "korea", "1" }; m[17] = new string[] { "baker", "green", "japan", "0" }; m[18] = new string[] { "actor", "green", "italy", "1" }; m[19] = new string[] { "actor", "green", "italy", "1" }; m[20] = new string[] { "diver", "green", "korea", "0" }; m[21] = new string[] { "baker", "green", "japan", "2" }; m[22] = new string[] { "diver", "green", "japan", "0" }; m[23] = new string[] { "baker", "green", "korea", "0" }; m[24] = new string[] { "diver", "green", "japan", "0" }; m[25] = new string[] { "actor", "hazel", "italy", "1" }; m[26] = new string[] { "diver", "hazel", "japan", "0" }; m[27] = new string[] { "diver", "green", "japan", "2" }; m[28] = new string[] { "diver", "green", "japan", "0" }; m[29] = new string[] { "clerk", "hazel", "japan", "2" }; m[30] = new string[] { "diver", "green", "korea", "0" }; m[31] = new string[] { "diver", "hazel", "korea", "0" }; m[32] = new string[] { "diver", "green", "japan", "0" }; m[33] = new string[] { "diver", "green", "japan", "2" }; m[34] = new string[] { "diver", "hazel", "japan", "0" }; m[35] = new string[] { "actor", "hazel", "japan", "1" }; m[36] = new string[] { "actor", "green", "japan", "0" }; m[37] = new string[] { "actor", "green", "japan", "1" }; m[38] = new string[] { "diver", "green", "japan", "0" }; m[39] = new string[] { "baker", "green", "japan", "0" }; return m; } // GetData() } // Program } // ns

The demo program hard-codes the data into a program-defined GetData() function. The data is returned as an array-of-arrays style matrix where each cell contains a string value. In a non-demo scenario, you might want to store data in a text file and write a helper function to read the data into a string of arrays of arrays.[ ][ ] matrix.

All the control logic is in the Main() method. Key data structures are an int array of arrays style[ ][ ] array to hold the joint accounts and an int[ ] vector to hold class counts:

int nx = 3; // number predictors (job, eye, country) int nc = 3; // number classes (0, 1, 2) int N = 40; // number data items int[][] jointCounts = new int[nx][]; for (int i = 0; i < nx; ++i) jointCounts[i] = new int[nc]; int[] yCounts = new int[nc];

The raw number of classes and the raw number of joints are calculated as follows:

string[] X = new string[] { "baker", "hazel", "italy"}; for (int i = 0; i < N; ++i) { // compute joint counts int y = int.Parse(data[i][nx]); // get the class as int ++yCounts[y]; for (int j = 0; j < nx; ++j) { if (data[i][j] == X[j]) ++jointCounts[j][y]; } }

The code iterates through the data matrix row by row. The row class label is extracted as an integer from the last row value at index [nx]. The class is used to increment the yCounts vector. Then the code loops through each predictor value and if the predictor value in the data array matches the predictor value in the item to be predicted, the corresponding entry in the jointCount array is incremented. After all the joint count values have been calculated, the demo code loops through the matrix and adds 1 to each count so that there are no 0 values.