AI isn't as smart as you think

(edited)

2023 was the year of ChatGPT. Everyone is using AI to solve problems, reduce labor, and trying to make a quick buck. It's an amazing technology and is extremely powerful, but at the same time it is super dumb.

We have always known computers are really dumb and are only as smart as the developers who program them. They are however extremely quick and excel at tasks that require repetative operations. I can say this, and you probably already know this, but let me show you some examples that really show what I mean.

I love AI, it is a fascinating technology and is extremely fun to work with. AI in the past has primarily been only available to those with endless amounts of money, but with the announcement of ChatGPT that all changed. Bleeding edge AI is now available to everyone on the planet. It doesn't end with ChatGPT either, ChatGPT was just a catalyst of the AI arms race.

You can now run AI on your own machine provided you have good enough hardware. By good enough hardware I mean as simple as a typical gaming pc. The faster the better, but some AI models you can use without even having a GPU.

I currently run around eight different Large Language Models (LLM, like ChatGPT) on my local machine. Each of them has uses and performance. Some are better for chat, some are better at coding, and some are just larger. Some of them are actively use for projects, and some I am just checking out to see if they are a suitable replacement. Opensource LLM are changing rapidly and a day doesn't go by where another isn't announced. The problem however is most of them are training to perform very well in AI leaderboards but in actual use they are fairly useless.

There are a few riddles and problems I like to use to see how well a model performs, while this isn't the end all be all test, it does give you a glimpse on how well a model performs. It isn't without a catch, which I will talk about later.

A simple problem a typical human can reason is this.

Claire has 6 brothers, and each of her brothers have two sisters. How many sisters does Claire have?

I think most people can figure this out, it's a pretty simple problem. Let's see how AI does.

ChatGPT (3.5 Turbo)

This answer is close, it's obviously wrong, but it's pretty close.

Bing Chat

Google Bard

Open Orca

DeepSeek 7B

ChatGPT 4

Here we have the correct answer, and rightly so as ChatGPT 4 is a huge improvement over all models and even ChatGPT itself (which uses ChatGPT 3.5 Turbo by default).

You can see each model interprets the question differently with only ChatGPT 4 giving the correct answer. This is a specifically a challenging problem for AI as it requires logic that seems simple for a human, but for a computer that is designed to predict the next character in a sentence it is a much more difficult task.

I mentioned above there is a catch with this test, and I'm going to get into that. This specific riddle is one that was made famous by someone on Reddit who posed this challenge a while ago. He did something similar, using it to test multiple models. Over time, two things were discovered.

The first, he noticed models started to solve the problem but it wasn't authentic, it was just regurgitating training data. What he discovered is his riddle was being used to train large language models so they can solve the riddle as well as many others. This resulted in certain models recoognizing this problem and spitting back an answer as if it was reciting multiplication tables. It doesn't actually do the thinking, it just "knows". This has been proven by (like my example) using a different name than the original riddle. In the explanation on how the AI solves it, it spits back the original name used by the Redditor, which was never given in the context. For example, in my example I named the sister Claire, but the AI would initially say Claire has x sisters, but when you ask it to break it down step by step, the last step would refer to the name in the original Riddle.

I said there was two things discovered, the second is that you can guide an AI to solve this problem, sometimes by simply asking it to do it step by step. This does not always change the outcome, but in some cases it did.

AI models are changing rapidly, and one of the things that differentiates a model is how it was training and what data was used to do it. Many open source models are fine tuned on specific data sets to improve their performance for specific tasks. A good example of this is the myraid of programming tuned models, most of which only perform well on Python. The amount of work and hardware to train a model is high, and in some cases millions of dollars to run a single training round.

I'll leave you with another interesting riddle that typically trips up AI.


Why you should vote me as witness

1.10265646 BEE
18 comments

I think AI is learning from us while we use it for so many purposes. It won't learn humanity though. Can you check out my latest article? Thanks if you have 2 minutes just for a feedback maybe with your opinion.

0.00000182 BEE

Wait until these LLM's start being trained on their own output, gonna be like the models on DMT.

0.00000182 BEE

The problem lies in how you ask the question. Use the following structure to help you come up with an accurate reasoning:

Claire has 6 brothers and each of her brothers have two sisters. How many sisters does Claire have?

The logical connective "and" without a comma establishes the relation that both premises must be given.



Bing Chat with GTP-4

0.00000122 BEE

yes, prompt engineering is an entire field on it's own right now. A big part of working with AI is figuring out the best prompts, and this varies model to model.

As a human, I don't have to play around with how I ask questions like that.

0.00000120 BEE

The real problem is that the common user does NOT know how to ask the question. Logic, mathematics, ..., have an exact way how to word it. A misplaced as, an accent and even a grammatical error, can confuse any human.

If I were to ask you a question like this: "How many bricks does it take to complete a building made of bricks?", neither you nor the AI can give me the answer and neither can any engineer. To solve this question requires data, specifically dimensions of the bricks and dimensions of the building... The way we formulate the question, the information provided, the logic and the structure of the sentence, count for anyone, human or not.

0.00000122 BEE

Thanks for sharing these comparisons... AI is very good at working with absolutely huge amounts of data, and finding patterns, etc... but it cannot actually think or solve problems. I think people keep forgetting that it's just a tool and not, you know, a god.

I do find it weird that not a lot of people seem to be too upset with the energy usage of a lot of LLMs, maybe it's not super common knowledge?

0.00000123 BEE

I do find it weird that not a lot of people seem to be too upset with the energy usage of a lot of LLMs, maybe it's not super common knowledge?

The energy usage is tiny, an AI model goes through two phases.

  • Training
  • Usage

Training is extremely time consuming and costly, but once that's done you can run many models on a Raspberry Pi to solve actual real problems. In the end, the final result is just a bunch of weights (numbers) representing paths to take in the neural network for specific inputs.

The amount of energy AI can save in the long run is astronomical. Imagine solving cancer, developing new energy solutions, making tedious task far more efficient, taking over the world with robots.

0.00126689 BEE

Thanks for that differentiation. I didn't realize the training only happened in the initial phase, I thought LLMs were continuously being trained with the latest available data, I'll have to look into it more. Thanks.

0.00000180 BEE

Great examples and I agree as I started using it more and working with it. It became very clear how legit "dumb" AI is and how it's pretty much just a algo of scrapped data that everyone else has produced already.

You legit have to train people how to write prompts to even correctly interact with the things lol.

What is cool though is the new way people are going about it. Before everyone coded in this case do A and in this case do B. How it's coded in a way that the script tries to figure it out and then learn and adopt. It's actully pretty awesome what they are doing with some machines.

0.00000123 BEE
(edited)

Your insights into the dynamic landscape of AI especially with the rise of accessible models like ChatGPT provide a fascinating perspective. The demonstration with the sister riddle illustrates both the progress and challenges in AI comprehension. Its intriguing to see the variations in responses across model emphasizing the evolving nature of these technologies. The revelations about models regurgitating data for specific challenges and the potential for guided problem solving shed light on the complexities of AI training.

0.00000122 BEE

Did AI wrote this reply?

0.00005042 BEE
(edited)
0.00000123 BEE

😂

0.00000352 BEE

I was thinking same thing lol

0.00000354 BEE

AI is one of man’s greatest discoveries but also a great distraction to young folk leaving them to rely so much on it

0.00000121 BEE
(edited)

ChatGPT 4 says that the answer is two sisters but isn't the correct answer to the question posed: (How many sisters does Claire have?) is just one?

The model gets it's explanation right, but the answer to the direct question at the end of the riddle is incorrect

0.00000121 BEE

Yes, technically it is wrong, but only by expression, the logic is accurate.

It correctly realizes there are two total sisters and Claire is one of them, but it inaccurately answers the direct question of "How many sisters does Claire have", but it comes to the correct conclusion.

This is another example how AI isn't as smart as people give it credit for and why AI isn't going to replace developers any time soon. You need to be a pretty good developer to take advantage of AI for developing as it frequently makes mistakes and you got to know what you are doing to identify and correct these.

0.00000152 BEE

I edited my question for clarity just before you replied, but yeah. I am of the opinion that genuine Artificial Intellegence doesn't exist!

0.00000121 BEE

0.00000153 BEE

Hey look at that!

0.00000121 BEE

I’m sooo looking forward to AGI. Now that will get many people bamboozled

0.00000120 BEE

AI still seems to be ever increasing amounts of data (albeit cleaned up a bit with labeling) and more compute massed together.

0.00000120 BEE

On the 27 of Sept. last year (my birthday) I asked ChatGPT= When is my birthday? Of course it answered there was no way it could know that. So I gave him a hint, when is Google's and Cosme Damian Churruca's (a Spanish admiral from over two centuries ago) birthday? That he answered and told me that given that information my birthday was on September 27 and it would congratulate me on that day. I told it it was the 27 and it apologized to me. So I think ChatGPT is very smart but is lost in time.

0.00000119 BEE

Almost every day there is a new announcement related to AI very hard to keep up with, I do like the assistants that you get in the openAI playground, they help so much with my day job.

Fantastic cover image, was that done in mid-journey? oh and if you like to watch TV shows about AI, the best one ever is called Beacon 23, very deep character development and a interesting story-line. (watched it twice now)

0.00000119 BEE

Dall-E, not MidJourney.

0.00000119 BEE

The hamburger was eaten and turned into poop. So eventually in the sewers, transcended into a divine deity?

0.00000120 BEE

Do you mind if I steal some of these illustrations and examples for a library presentation on AI which I have been developing?

0.00000120 BEE

Sure

0.00000119 BEE

I think we will see these engines get much better in the next few years. AI research has gone on for ages with Turing doing some of the early work, but access to lots of processing power is accelerating things. I see lots of people saying they are 'dumb', but they can do a lot. They just lack that 'common sense' element that even some people struggle with. I would expect that people are developing knowledge bases of certain fundamental facts that the engines can use, so they don't give people too many fingers/noses/eyes in generated images.

This technology will create lots of challenges. We're already seeing it used to generate posts and replies here with peakd integrating it. It is a revolution and there will be casualties. I just hope that a lot of good can come out of it too.

0.00000120 BEE
(edited)

I dunno, I like bard's whole 'it depends' shitlordery. The brothers could indeed have half sisters XD

image.png

It's interesting to me as a musician to see how utterly useless these AI are with the subject. It very quickly gets confused with a lot of basic requests that aren't even riddles but just like, chord progressions

0E-8 BEE

lol, it's hysterical when it strikes out.

0.00000145 BEE

It's interesting to me as a musician to see how utterly useless these AI are with the subject. It very quickly gets confused with a lot of basic requests that aren't even riddles but just like, chord progressions

You can fine tune them to greatly improve them, and prompt engineering is a real science, you can drastically improve your results if you know how to ask properly.

0.00000120 BEE

Yeah the whole art of prompting is beyond me based on my attempts in the past! I'd enjoy a music-specific bot though, for whatever purpose

0E-8 BEE

that was also funny to me how Bard went "this is a big family and who know how all of those people are related" 😄

0.00000119 BEE

Ah, yeah. This reminds me of the Loebner Prize, where chatbots are tested by a jury that tries to expose them as non-human. They ask them these kinds of silly questions. I understand their motivation, but I also know that the AIs you have shown are not pretending to be human, so I don't expect them to be perfect.

Maybe these puzzles are part of linear reasoning, which has been a challenge for language models for a long time, as I learned from a Ted Ed talk. But as you said, they are improving.

I also notice that some models give very formal answers to what humans would consider simple riddles. I know they have limitations, but I also think that the riddle is not a pure logic problem. It is just a riddle. In logic, you need to ensure that your propositions are valid. I wonder if the AI could do better if the riddle was rephrased and broken down into valid propositions.

This also makes me think of the cognitive development puzzles that Piaget studied and that educators use with children under 10. I guess they don't mock them for thinking that a pile of coins has more than a stack, even though they have the same number.

I see these AI issues as fun and interesting examples that can help us find areas for improvement. But if I can't quote a a rigorous study that shows that language models struggle with this type of problems, I would be cautious about judging them. Maybe I am ignoring hundreds of riddles where the models do well and only focusing on their failures. But if I clarify that this is just an observation and not a generalization, I think it is fine.

Good read. Congrats!

0E-8 BEE

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0E-8 BEE