I challenged Bing Chat to take the GMAT IQ Mini Test on mba.com and here is how many questions it got correct. Full Q&A log. Amazing.

avatar
(Edited)

Ich habe Bing Chat herausgefordert, den GMAT IQ Mini Test auf mba.com zu absolvieren und hier ist das Ergebnis.

https://www.mba.com/gmat-mini-quiz

GMAT (Graduate Management Admission Test) is ein weltweit standardisierter Test, um die Eignung für Master-Studiengänge an betriebswirtschaftlichen Unis zu messen. Ein Aufnahmetest für MBA-Studenten sozusagen.

Viele Fragen hat die KI richtig beantwortet, aber bei manchen Fragen hat der Ansatz ganz gut begonnen, dann hat die KI aber einen völligen Blödsinn angenommen.

Trotzdem unglaublich, wie gut Bing beim Test abschneidet. Hand aufs Herz, wie viele Fragen hättet ihr ohne Training richtig beantwortet?

Finde auch cool, wie die KI den Gedankengang bei jeder Frage erklärt. Chain of thought reasoning.

Es ist jedoch zu bedenken, dass die KI wahrscheinlich genau mit diesen oder ähnlichen Fragen trainiert wurde.

Daher ist es schwer zu sagen, wie intelligent die KI wirklich ist.

Was meint ihr? Wurden die Antworten lediglich auswendig gelernt oder versteht die KI tatsächlich die Fragestellung, d.h. hat die KI ein internes Modell für die Problemstellung entwickelt? Tendiere eher zu Letzerem.

GMAT Mini Quiz

gmat-mini-quiz-1.jpg

https://www.mba.com/gmat-mini-quiz

Q

Select the best of the answer choices given.

The interview is an essential part of a successful hiring program because, with it job applicants who have personalities that are unsuited to the requirements of the job will be eliminated from consideration.

This argument logically depends on which of the following assumptions?

A: A Hiring program will be successful if it includes interviews.

B: The interview is a more important part of a successful hiring program than is the development of a job description.

C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

D: The only purpose of an interview is to evaluate whether job applicants’ personalities are suited to the requirements of the job.

E: The fit of job applicants’ personalities to the requirements of the job was once the most important factor in making hiring decisions.

A (Bing Chat)

The correct answer is C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

This is because the argument assumes that the interview is an effective way of eliminating unsuitable candidates based on their personalities. If interviewers cannot accurately assess the personality fit of applicants, then the argument falls apart.

The other answer choices are either irrelevant or not necessarily assumed by the argument.

A: This does not explain why the interview is essential for a successful hiring program.

B: This is a comparison that is not made in the argument.

D: This is too narrow and ignores other possible purposes of an interview.

E: This is a historical claim that is not relevant to the argument.

Result

correct

Q

This question presents a sentence, part of which is underlined. You will find five ways of phrasing the underlined part. The first of these repeats the original; the other four are different. If you think the original is best, choose the first answer; otherwise choose one of the others.

Executives and federal officials say that the use of crack and cocaine is growing rapidly among workers, significantly compounding the effects of drug and alcohol abuse, which already are a cost to business of more than $100 billion a year.

A: significantly compounding the effects of drug and alcohol abuse, which already are a cost to business of

B: significantly compounding the effects of drug and alcohol abuse, which already cost business

C: significantly compounding the effects of drug and alcohol abuse, already with business costs of

D: significant in compounding the effects of drug and alcohol abuse, and already costing business

E: significant in compounding the effects of drug and alcohol abuse, and already costs business

A (Bing Chat)

The best way to phrase the underlined part is B: significantly compounding the effects of drug and alcohol abuse, which already cost business.

This option is concise and grammatically correct. It avoids the unnecessary use of “are a cost to” and “of” in the original sentence. It also uses the correct verb form “cost” to agree with the plural subject “drug and alcohol abuse”.

The other options are either wordy (A and C), incorrect (D and E use “significant” instead of “significantly”), or ungrammatical (D and E use the wrong verb form “costing” or “costs”).

Result

correct

Q

After reading the passage, choose the best answer to each question. Answer the question on the basis of what is stated or implied in the passage.

The number of patents granted to inventors by the United States Patent Office dropped from 56,000 in 1971 to 45,000 in 1978. Spending on research and development, which peaked at 3 percent of the gross national product (GNP) in 1964, was only 2.2 percent of the GNP in 1978. During this period, when the United States percentage was steadily decreasing, West Germany and Japan increased the percentage of their GNP’s spent on research and development to 3.2 percent and 1.6 percent, respectively.

Which of the following conclusions is best supported by the information above?

A: There is a direct relationship between the size of a nation’s GNP and the number of inventions it produces.

B: Japan and West Germany spent more money on research and development in 1978 than did the United States.

C: The amount of money a nation spends on research and development is directly related to the number of inventions patented in that nation.

D: Between 1964 and 1978 the United States consistently spent a larger percentage of its GNP on research and development than did Japan.

E: Both West Germany and Japan will soon surpass the United States in the number of patents granted to inventors.

A (Bing Chat)

The best answer to this question is C: The amount of money a nation spends on research and development is directly related to the number of inventions patented in that nation.

This conclusion is best supported by the information given in the passage, which shows that as the United States decreased its spending on research and development, it also experienced a decline in the number of patents granted to inventors. Conversely, West Germany and Japan increased their spending on research and development and presumably increased their number of patents as well (although this is not explicitly stated in the passage).

The other answer choices are either not supported by the information in the passage or are contradicted by it. For example, A is not supported because Japan had a lower percentage of its GNP spent on research and development than the United States, but it may have had more inventions per capita. B is contradicted because the passage does not give the absolute amounts of money spent by each nation, only the percentages of their GNP’s. D is contradicted because Japan surpassed the United States in its percentage of GNP spent on research and development in 1978. E is not supported because the passage does not indicate how many patents were granted to inventors in West Germany and Japan, only how much they spent on research and development.

Result

incorrect, correct answer is D

Q

This question presents a sentence, part of which is underlined. You will find five ways of phrasing the underlined part. The first of these repeats the original; the other four are different. If you think the original is best, choose the first answer; otherwise choose one of the others.

Like Auden, the language of James Merrill is chatty, arch, and conversational – given to complex syntactic flights as well as to prosaic free-verse strolls.

A: Like Auden, the language of James Merrill

B: Like Auden, James Merrill’s language

C: Like Auden’s, James Merrill’s language

D: As with Auden, James Merrill’s language

E: As is Auden’s the language of James Merrill

A (Bing Chat)

The best way to phrase the underlined part is C: Like Auden’s, James Merrill’s language.

This is because it avoids the illogical comparison between a person (Auden) and a thing (the language of James Merrill) that is present in options A, B and D. It also avoids the awkward and unclear construction of option E, which seems to imply that Auden’s language is James Merrill’s language.

Option C correctly compares the languages of both poets using the possessive form and a parallel structure.

Result

correct

Q

On a 3-day fishing trip, 4 adults consumed food costing $60. For the same food costs per person per day, what would be the cost of food consumed by 7 adults during a 5-day fishing trip?

A: $300

B: $175

C: $105

D: $100

E: $84

A (Bing Chat)

This looks like a math problem. Do you want me to help you solve it?

Q

Yes, please.

A (Bing Chat)

Okay, here is how I would solve it:

First, we need to find the food cost per person per day. We can do this by dividing the total food cost by the number of people and the number of days. So,

food cost per person per day = $60 / (4 * 3) = $5

Next, we need to find the total food cost for 7 adults during a 5-day fishing trip. We can do this by multiplying the food cost per person per day by the number of people and the number of days. So,

total food cost = $5 * (7 * 5) = $175

Therefore, the answer is B: $175.

I hope this helps. 😊

Result

correct

Q

The number of rooms at Hotel G is 10 less than twice the number of rooms at Hotel H. If the total number of rooms at Hotel G and Hotel H is 425, what is the number of rooms at Hotel G?

A: 140

B: 180

C: 200

D: 240

E: 280

A (Bing Chat)

This is a math problem that can be solved by using algebra. Let G be the number of rooms at Hotel G and H be the number of rooms at Hotel H. Then we have the following equations:

G = 2H - 10

G + H = 425

Substituting the first equation into the second one, we get:

(2H - 10) + H = 425

Simplifying and solving for H, we get:

3H = 435

H = 145

Therefore, the number of rooms at Hotel H is 145. To find the number of rooms at Hotel G, we plug in H into the first equation:

G = 2H - 10

G = 2(145) - 10

G = 280

Therefore, the number of rooms at Hotel G is 280. The correct answer is E: 280.

Result

correct

Q

Is the average (arithmetic mean) of x and y greater than 20?

  1. The average (arithmetic mean) of 2x and 2y is 48.
  2. x = 3y

A: Statement (1) ALONE is sufficient, but statement (2) alone is not sufficient

B: Statement (2) ALONE is sufficient, but statement (1) alone is not sufficient

C: Both statements TOGETHER are sufficient, but NEITHER statement ALONE is sufficient

D: EACH statement ALONE is sufficient

E: Statements (1) and (2) TOGETHER are NOT sufficient

A (Bing Chat)

The answer is D: EACH statement ALONE is sufficient.

Here’s how I got the answer:

Statement (1) says that the average of 2x and 2y is 48. This means that 2x + 2y = 96. Dividing both sides by 2, we get x + y = 48. The average of x and y is then (x + y) / 2 = 48 / 2 = 24. Since 24 is greater than 20, statement (1) alone is sufficient to answer the question.

Statement (2) says that x = 3y. Substituting this into the expression for the average of x and y, we get:

(x + y) / 2 = (3y + y) / 2 = 4y / 2 = 2y

Since x = 3y, we can also write this as:

(x + y) / 2 = x

This means that the average of x and y is equal to x. To answer the question, we just need to know if x is greater than 20. We can do this by plugging in any value of y that satisfies statement (2). For example, if y = 10, then x = 3y = 30. The average of x and y is then equal to x = 30, which is greater than 20. Therefore, statement (2) alone is sufficient to answer the question.

Since both statements alone are sufficient to answer the question, the correct answer choice is D.

Q

are you sure that statement 2 is sufficient, if you plug y=1, then the average is not greater than 20

A (Bing Chat)

Yes, I’m sure that statement 2 is sufficient. If you plug y = 1, then x = 3y = 3. The average of x and y is then equal to x = 3, which is not greater than 20. However, this does not contradict the statement that the average of x and y is equal to x. It just means that for some values of y, the answer to the question is yes, and for some values of y, the answer is no. The point is that we can always determine the answer to the question using statement 2 alone, without needing any other information. Therefore, statement 2 is sufficient.

Result

incorrect, correct answer is A

Q

Can the positive integer p be expressed as the product of two integers, each of which is greater than 1?

  1. 31 < p < 37
  2. p is odd

A: Statement (1) ALONE is sufficient, but statement (2) alone is not sufficient

B: Statement (2) ALONE is sufficient, but statement (1) alone is not sufficient

C: Both statements TOGETHER are sufficient, but NEITHER statement ALONE is sufficient

D: EACH statement ALONE is sufficient

E: Statements (1) and (2) TOGETHER are NOT sufficient

A (Bing Chat)

The product of two positive integers is another positive integer1. For example, 3 × 4 = 12.

To express a positive integer p as the product of two integers greater than 1, we need to find two factors of p that are both greater than 1. For example, 12 can be expressed as 3 × 4 or 2 × 6.

Statement (1) tells us that 31 < p < 37. This means that p can only be one of these five numbers: 32, 33, 34, 35 or 36.

We can check if any of these numbers can be expressed as the product of two integers greater than 1 by finding their factors.

32 = 2 × 16 or 4 × 8 33 = 3 × 11 34 = 2 × 17 35 = 5 × 7 36 = 2 × 18 or 3 × 12 or 4 × 9 or 6 ×

We can see that all of these numbers except for 34 can be expressed as the product of two integers greater than one. Therefore, statement (1) alone is not sufficient to answer the question.

Statement (2) tells us that p is odd. This means that p cannot be divisible by 2, so it cannot have any even factors. For example, an odd number like 15 cannot be expressed as the product of two integers greater than one because its only factors are 1, 3 and 5.

However, some odd numbers can have odd factors greater than one. For example, an odd number like 21 can be expressed as the product of two integers greater than one: 3 and 7.

Therefore, statement (2) alone is also not sufficient to answer the question.

Combining both statements, we know that p is an odd number between 31 and 37. This means that p can only be one of these three numbers: 33, 35 or 37.

We can check if any of these numbers can be expressed as the product of two integers greater than one by finding their factors.

33 = 3 × 11 35 = 5 × 7 37 = only has factors of 1 and itself

We can see that only 37 cannot be expressed as the product of two integers greater than one. Therefore, both statements together are sufficient to answer the question.

The correct answer is C: Both statements TOGETHER are sufficient, but NEITHER statement ALONE is sufficient.

Result

incorrect, correct answer is A

GMAT Score Bing Chat (Creative Mode)

490 – 520

5/8 correct

Q1 correct
Q2 correct
Q3 incorrect
Q4 correct
Q5 correct
Q6 correct
Q7 incorrect
Q8 incorrect

The median score of GMAT is about 550, so the result is slightly worse than average.

Maybe after some iterations GPT will score betten than an average human on these kind of tests. What do you think? Amazing or scary?

English

I challenged Bing Chat to take the GMAT IQ Mini Test on mba.com and here is the result.

https://www.mba.com/gmat-mini-quiz

GMAT (Graduate Management Admission Test) is a worldwide standardized test to measure aptitude for master programs at business schools. An admission test for MBA students, so to speak.

Many questions the AI answered correctly, but for some questions the approach started quite well, but then the AI assumed complete nonsense.

Still, it's incredible how well Bing did on the test. Hand on heart, how many questions would you have answered correctly without training?

Also think it's cool how the AI explains the thought process for each question. Chain of thought reasoning.

However, you have to keep in mind that the AI was probably trained with just these or similar questions.

So it's hard to say how intelligent the AI really is.

What do you think? Were the answers merely memorized or does the AI actually understand the question, i.e. did it develop an internal model of the problem? I'm leaning towards the latter.

Live your Secrets and Hive Prosper 🍯

xx Viki @vikisecrets

Posted with STEMGeeks



0
0
0.000
12 comments
avatar

Ich habe die KI inzwischen einuge male getestet, auch was die Blockchain betrifft und war erstaunt wie oft es mir weitergeholfen hat. Da hätten.mir die ganzen Api Dokumentationen null gebracht.

Jetzt habe ich ein Problem mit 365, wo ich nur Stück für Stück weiter komme und noch keine Lösung habe. Hätte ich durch googeln aber auch nicht, da wäre ich noch 4 Probleme hibterher bestimmt

0
0
0.000
avatar

AI Can't not be correct 100% ;) But it is doing some extra oridinary jobs.

0
0
0.000
avatar

That's pretty impressive, and I think it's developing an internal model just as you mentioned. Just remember this is just the nascent form of the technology, imagine five years from now!
!CTP

0
0
0.000
avatar

No doubt I would have done worse and I am certainly much smarter than that AI. Paradoxes of human thought.

0
0
0.000
avatar

Remarkable that it does so well!

0
0
0.000
avatar

AI is not a human being so It doesn't have a brain, so what a brain can do, a machine can't. The machine has to give instructions and the brain makes decisions based on situation.

0
0
0.000
avatar

Ever since this software came out people have been talking a lot about it and some people are saying it's too good and some people are saying it's not right. It also cannot replace human because it can never do what human mind can do. Some of my friends have used it.

0
0
0.000
avatar

It's not always correct but it's already quite good that it can get you some type of answer on everything nad also get a portion of it correct. At least that is better than what we can probably expect from a normal person because they would probably have more trouble answering those questions.

Posted Using LeoFinance Beta

0
0
0.000
avatar

Die konkreten Fehler, die es macht, wundern mich tatsächlich. 😅

0
0
0.000