AI / ML: Data isn't everything

You don't need data to make excellent AI

October 19th, 2022

Biking in country side

It was 2014 and I was taking an introductory information systems course. We used a fictitious bike sales company to run a simple linear regression. With the data that was provided, I was able to predict if someone was likely to purchase a bike based upon simple information about the customer. In essence, you could predict someone’s behavior with cool technology (if you consider Excel cool like I do) and data.

It completely blew my mind 🤯.

I anchored around the idea that for an AI to be powerful, you need data. But my eyes have since been opened. I hope to show you that although data is necessary for some ML/AI applications, it isn’t necessarily the only way to have powerful predictions and AI level agents.

I’ll be going through a few examples of algorithms that are able to build powerful AI without the need of historical data.

Minimax & Alpha Beta Pruning

Suppose you wanted to create an AI to play a simple game like Tic Tac Toe. That AI agent could be trained using historical data to see how experts play the game like how Alpha Star learned to play StarCraft 2. But there is a simpler approach that is potentially more fruitful, in many cases.

Building the Game Tree

In our Tic Tac Toe example, let's create a game tree of some of the options that a player could make on the first turn. We start with our first player, X.

Tic Tac Toe Turn One

NOTE: I’m only going to show 3 options instead of the full 9 because I just don’t have the space to show them all!

Then the second player, O, would make their first choice based upon where X played.

Tic Tac Toe Turn Two

Next, we show all the available positions that X can now play based upon where O decided to move on the previous move.

Tic Tac Toe Turn Three

We are only 3 moves in, and you can already see that the game tree is growing exponentially. Keep in mind, if you were building out the game tree, you would be building out ALL the different permutations compared to the 3 of each that I am doing in my example.

As you get to each leaf of the tree, they may look like something like this.

Leaf of Game Tree

I denoted a loss with the red box, and a win with the green box.

We of course would also have draws in tic tac toe, but I didn’t show that in the example. The game tree can be quantified though, we can assign a value for the win, lose, or draw terminal states. Let’s choose a 5 for win, -5 for lose, and 0 for a draw.

Minimax Applied

Now let’s bring in the power of the minimax algorithm. The minimax algorithm is almost self-explanatory. It is a back tracking algorithm that is used to find either the max or min value and then propagate that up a tree. This is perfect in this situation because when it is our turn, we want to maximize the value. When it is the opponents turn, we assume that they will want to minimize our value.

Even though we know nothing about the strategy, with a game tree and minimax, we can literally play out all the possibilities that can happen in the game and when it is our turn, we make the best choice that will give us the most possibilities that end in a positive state.

The weakness of minimax though is with data processing. Tic tac toe is a relatively simple 3x3 game and it has over 250,000 possible outcomes. Imagine a game with a gameboard that is 8x8 like chess or checkers. The game of checkers alone has over 500,000,000,000,000,000,000 (500 quintillion) combinations. Modern computers can’t handle that many calculations, especially in a reasonable amount of time.

Improved Minimax with Pruning

There are ways around this though! Alpha beta pruning is a method to decrease the number of branches that you traverse. If you see good options early on, then don’t keep going down the bad branches and prune those parts of the game tree out. Additionally, you don’t need to find the end state, you just need to find the best option from the current state. In our tic tac toe example, after O chooses a spot, then we can run minimax on the current state and cap the processing to a time period, like two seconds. After two seconds, we then choose the best option from the game tree at that moment.

Pretty incredible right!? AI doesn’t need to be taught. Using modern computing, it can literally play out all the scenarios right in front of you and pick the best option for it right then.

No historic data required!

Bayes Nets

Suppose you have a cough and decide to go to the doctor. The doctor runs a few tests, and you test positive for a super serious sickness that only affects a few people each year. The accuracy of this test is 99%, so does that mean that there is a 99% chance that you have the disease?

Well, not exactly.

Bayes Theorem

This is where Bayes Theorem comes in. Bayes Theorem is a formula derived by Thomas Bayes, that basically helps you determine conditional probabilities.

Bayes Theorem

So, in the case of our super serious sickness example. What is the actual probability that you are sick? Well, we can work this out using the formula! Let’s assume that the event A is if you have the super serious sickness, and that event B is if you test positive to the test. Let’s also assume that the probability of the test being positive is 0.99 and the probability of having the super serious sickness 1 in every 10,000 people so it is 0.0001.

P(B|A) = 0.99
P(A) = 0.0001
P(B) = 0.010098
- This can be solved by conditioning on whether event A does or does not occur. Which means that P(B) = P(B|A)P(A) + P(B|not A)P(not A) = 0.99*0.0001 + 0.01 * 0.9999.

Then we can put all the values in the formula which then equals this: P(A|B) = (0.99 * 0.0001) / 0.010098 which is equal to 0.0098. So, the probability of the test being positive is 99% but the probability of having the super serious sickness GIVEN a positive test is less than one percent. Wow!

Bayes Networks

Now it is cool to do math and all, but let’s steer this conversation back to predictions and AI. If you know the distributions of events, then you can run the math to get the probability of events occurring. In fact, you can also add complex event structures with DAGs to simulate these. These are known as Bayes Networks, or Bayes Nets.

DAG – Directed Acyclic Graph. Basically, a graph of nodes where there is never a loop in between nodes.

So, a Bayes Net may look something like this.

Bayes Net

Now that we have our Bayes Net and some probabilities of the different states, then we can use Python to make probability predictions from our network!

Bayes Nets with Python

First we need to import the proper libraries. Pgmpy has a great library for setting up bayes nets. Let's create the basic model, add our nodes from the image above, add the relationships, then add the probabilites for our different states.

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

def get_net():
    BayesNet = BayesianModel()
    BayesNet.add_node("SuperSeriousSickness")
    BayesNet.add_node("SicknessTest")
    BayesNet.add_node("Coughing")
    BayesNet.add_node("EatingHealthy")
    BayesNet.add_node("GettingEnoughSleep")

    # Defines the relationship between the different nodes
    BayesNet.add_edge("SicknessTest", "SuperSeriousSickness")
    BayesNet.add_edge("Coughing", "SuperSeriousSickness")
    BayesNet.add_edge("EatingHealthy", "Coughing")
    BayesNet.add_edge("GettingEnoughSleep", "Coughing")

    # Defines the probabilites of different nodes
    cpt_sicknessTest = TabularCPD(
        'SicknessTest', 
        variable_card=2, 
        values=[[0.99], [0.01]]
    )

    cpt_eatingHealth = TabularCPD(
        'EatingHealthy', 
        variable_card=2, 
        values=[[0.65], [0.35]]
    )

    cpt_enoughSleep = TabularCPD(
        'GettingEnoughSleep', 
        variable_card=2, 
        values=[[0.45], [0.55]]
    )

    cpt_coughing = TabularCPD(
        'Coughing',
        variable_card=2,
        values=[[0.37, 0.65, 0.87, 0.95],
                [0.63, 0.35, 0.13, 0.05]],
        evidence=['EatingHealthy', 'GettingEnoughSleep'],
        evidence_card=[2, 2]
    )

    cpt_superSeriousSickness = TabularCPD(
        'SuperSeriousSickness',
        variable_card=2,
        values=[[0.999999, 0.9995, 0.98, 0.9],
                [0.000001, 0.0005, 0.02, 0.1]],
        evidence=['SicknessTest', 'Coughing'],
        evidence_card=[2, 2]
    )

    BayesNet.add_cpds(
        cpt_sicknessTest, 
        cpt_eatingHealth,
        cpt_enoughSleep, 
        cpt_coughing, 
        cpt_superSeriousSickness
    )

    return BayesNet

bn = get_net()

Setting up the network can be a little tricky, so I like to then test to see if it worked and is set up correctly.

# True if the model is valid
print(bn.check_model())

If we wanted to know what the probability of having the super serious sickness is, given we aren’t getting enough sleep, then we can use variable elimination! Pgmpy makes this fairly straight forward.

solver = VariableElimination(bn)
conditional_prob = solver.query(variables=['SuperSeriousSickness'], evidence={
                                'GettingEnoughSleep': 0}, joint=False)

# Returns an array
# [Prob of False, Prob of True]
prob = conditional_prob['SuperSeriousSickness'].values

print(f'{prob[1]:.10f}')
# 0.0007897646

Probability of being super serious sick when you aren't getting enough sleep is still pretty low. So don't worry too much!

Beautiful. With distributions, you can make accurate estimations and find conditional probabilities. No data required!

Conclusion

I shared two different ways to have a form of artificial intelligence with predictive decisioning, without using a mountain of data. The first was using shear computing power. With a powerful computer, you can literally figure out all the permutations of a decision and figure out which is the best one to act on now. The second form was more nuanced, where you are using distributions of data, but not the actual data, to use math to perform complex decisioning between states.

But there are many other ways! You can use Hidden Markov Models and the Viterbi algorithm to find predictable states, clever search algorithms like A* for path finding to find optimal routes, and so many others!

AI isn’t all about data. It is about intelligence and that intelligence doesn’t need to always come from a csv of historic data.

Resources

Medical Tests and Bayes Theorem - https://math.hmc.edu/funfacts/medical-tests-and-bayes-theorem/
Thad Starner's AI course - https://omscs.gatech.edu/cs-6601-artificial-intelligence
Minimax - https://en.wikipedia.org/wiki/Minimax
Alpha Beta Pruning - https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning
Bayes Nets - https://machinelearningmastery.com/introduction-to-bayesian-belief-networks/
Classifiers with Bayes - https://www.youtube.com/watch?v=O2L2Uv9pdDA
Pgmpy Docs - https://pgmpy.org/
Pgmpy Bayes Network Example - https://pgmpy.org/examples/Creating%20a%20Discrete%20Bayesian%20Network.html