There are many different types of "artificial intelligence" and so there isn't a one-size-fits-all description for how they work.So basically exactly as I said? No real "thinking" going on? Just a very sophisticated search and probability algorithm?
It is my understanding , this is the way AI chess games work. They "simply" run through all of the possible moves and calculate the outcome. I am by no means a good chess player but it is my understanding, this is pretty much what the human players do. The computer works so much faster.
The photo above illustrates very well how amazing the human brain works. We really aren't doing any kind of search of images. Or at least I don't think so. For some reason, we can easily tell the difference between a puppy and a muffin even when most of the data is hidden from us.
For some types of systems, it is about pattern recognition. This is pretty key to how human brains work. But human brains are so unbelievable more capable in terms for raw processing that computer based systems are at a huge disadvantage, especially when you take into account the intrinsic parallelism that goes on in the brain that even the most highly-parallelized computer can't even begin to scratch.
The muffin/puppy data does illustrate how good the human brain is at learning SOME kinds of patterns and then generalizing them -- but there are lots of data sets of images that you can develop that humans are lousy at categorizing and that computer AI is very good at.
Based on literally years of constant learning, the human brain learns to categorize what it sees (and hears and smells and feels) lots of different ways and then is able to very quickly make judgments about it based on weighting the relevance of all those different categories and applying lots of filters regarding what can and can't be and then extrapolating from there.
While a huge fraction of human learning is unguided, a lot of it IS guided. Little kids are shown a something (be it a color, or a letter of the alphabet, or a number, or a picture of a cat) and are told what it is. Then they are shown it again (perhaps the exact same image or perhaps a different image of the same thing) and are asked what it is and are then told whether they are right or wrong or perhaps are told what the correct answer is.
Let's consider two very different training sets. In the first one, the person is shown different letters of the alphabet, but not all of them -- perhaps twenty. They are trained until they are nearly flawless at recognizing them correctly no matter how distorted they might be (up to a limit, of course). Now you show them one of the ones that wasn't in the training set. Most of them will try to pick one of the twenty letters they know -- essentially going for a closest match. A few might be able to say that they don't recognize it. None of them are going to correctly identify it.
Now consider a training set consisting of lots of pictures of dogs, cats, rodents, fish, and perhaps a dozen other types of animals at that same level of granularity. After they have been trained, you can probably show them a picture of an animal that falls into one of those categories but that looks significantly different than any of the pictures they've seen and may well even be, at least superficially, somewhat closer to one of the other animals (perhaps in coloring). A large fraction of people will be able to properly classify that new animal because they have been able to generalize things like features that all dogs have and that no dogs have (as well as that most dogs have or that few dogs have). But the kicker is that while some of the features that we rely on are pretty evident, there are lots of features that we rely on that are pretty subtle and that we can't easily identify.
Feedback neural networks try to emulate the guided learning model in which the network is presented with an image and told whether it is right or wrong, or perhaps what the answer should have been, and it then adjusts its internal weights to move its answer a bit closer to the correct answer. This is repeated over and over and over. At then end, it is shown images it has never seen before and evaluated on how well it has been able to extract a relevant set of features from the training set and apply them to new input.