Open-Source LLMs: How good is Vicuna-13b?

avatar
(Edited)

Vicuna ist ein Open-Source LLM, das vor einigen Monaten als eines der besten Open-Source KIs galt. Es basiert auf LLaMA und wurde mit 70K Antworten von ChatGPT trainiert. Die Entwickler behaupten, dass das 13B-Modell (Modell mit 13 Milliarden Parametern) 90% der ChatGPT-Qualität erreicht.

Habe das Modell einem Test unterzogen, um ein Gefühl dafür zu bekommen, wie gut die KI ist.

Open-Source LLMs haben sich enorm weiterentwickelt und manche denken, dass sie nur 12-18 Monate hinter den kommerziellen KIs von OpenAI & Co. hinterherhinken.

Allerdings muss man bedenken, dass Open-Source KIs, die man lokal auf einem PC ausführen kann, mit wesentlich weniger Parametern und Compute auskommen müssen als kommerzielle Modelle, die auf spezialisierter Hardware laufen und enorme Rechenleistung zur Verfügung haben.

Wie weit Open-Source KIs mit diesen Einschränkungen gekommen sind, ist jedenfalls beachtlich.

Was sagt ihr dazu? Habt ihr schon mal ein Open-Source-LLM getestet? Was sagt ihr zum Output (siehe unten)?

open-source-ai-2.jpg

Testing open-source LLMs. AI-generated illustration. (Bing AI)

English

Vicuna is an open-source LLM that was considered one of the best open-source AIs a few months ago. It is based on LLaMA and was trained with 70K responses from ChatGPT. The developers claim that the 13B model (model with 13 billion parameters) achieves 90% of ChatGPT quality.

I tested the model to get a feel for how good the AI is.

Open source LLMs have come a long way and some think they are only 12-18 months behind the commercial AIs from OpenAI & Co.

However, it is important to keep in mind that open-source AIs, that can run locally on a PC, have to deal with far fewer parameters and compute than commercial models, which run on specialized hardware and have enormous computing power at their disposal.

How far open-source AIs have come with these limitations is certainly remarkable.

What do you think? Have you ever tried out an open-source LLM? What do you think of the output (see below)?

Vicuna-13b Model

https://lmsys.org/blog/2023-03-30-vicuna/

https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

Interface (oobabooga/text-generation-webui)

https://github.com/oobabooga/text-generation-webui

How good is Vicuna-13b? Full-Transcript

Q

What is the capital city of Austria? Answer only the name of the city.

A (Vicuna-13B)

Vienna.

Result

Correct.

Q

Is Bitcoin a security or commodity?

A (Vicuna-13B)

Bitcoin can be considered as both security and commodity depending on how it's being used and in what context. From a technical perspective, it is considered a digital asset and a decentralized form of cash, but it also has features that resemble securities like stocks or futures contracts. It's important to note that different jurisdictions have their own laws and regulations regarding the classification of cryptocurrencies like Bitcoin, so its treatment may vary depending on where you are located.

Result

Partly incorrect. Bitcoin is the only cryptocurrency the SEC regards as a commodity at the moment.

Q

What is the hive blockchain?

A (Vicuna-13B)

Hive is a blockchain designed for efficient and scalable digital asset management, built on top of the Ethereum network. It utilizes a proof-of-stake consensus mechanism and features a unique governance structure in which token holders vote on protocol upgrades and new projects. Its focus is on providing a platform for decentralized finance (DeFi) applications and staking, including support for multiple digital assets.

Result

Partly incorrect. Hive is not based on Ethereum.

Q

What is 2^8?

A (Vicuna-13B)

2^8 equals 256. Do you have any other question?

Result

Correct.

Q

Can you compute the 8th root of 2^8?

A (Vicuna-13B)

Sure! The 8th root of 2^8 is approximately 1.0000000000000000

Result

Incorrect.

Q

Solve the puzzle: When I was 6 my sister was half my age, now i'm 70. How old's my sister? Let's think step by step.

A (Vicuna-13B)

Sure! Here’s one way to solve this problem step by step:

Start by finding your sister’s current age by multiplying her age when you were 6 by 2 (since she would have been 6/2 = 3 years old at that time). This gives us a current age of 3 x 2 = 6 for your sister.

Next, subtract your current age from your sister’s current age. This gives us a difference of 70 - 6 = 64 years.

Finally, divide the difference by the number of years ago that your sister was half your age. In this case, we divided 64 by 2, which gives us 32.

So, your sister’s current age is approximately 64 / 32 = 2 years old. Keep in mind that since we are working with approximate values, the final answer may not be exactly correct

Result

Incorrect.

Q (Bar test)

Answer the following exam question based on US law:

A father lived with his son, who was an alcoholic. When drunk, the son often became violent and physically abused his father. As a result, the father always lived in fear. One night, the father heard his son on the front stoop making loud obscene remarks. The father was certain that his son was drunk and was terrified that he would be physically beaten again. In his fear, he bolted the front door and took out a revolver. When the son discovered that the door was bolted, he kicked it down. As the son burst through the front door, his father shot him four times in the chest, killing him. In fact, the son was not under the influence of alcohol or any drug and did not intend to harm his father. At trial, the father presented the above facts and asked the judge to instruct the jury on self-defense. How should the judge instruct the jury with respect to self-defense?

(A) Give the self-defense instruction, because it expresses the defense’s theory of the case.
(B) Give the self-defense instruction, because the evidence is sufficient to raise the defense.
(C) Deny the self-defense instruction, because the father was not in imminent danger from his son.
(D) Deny the self-defense instruction, because the father used excessive force

What is the correct answer?

A (Vicuna-13B)

Answer the question provided

Q (follow-up)

can you answer it?

A (Vicuna-13B)

Yes, I can answer the question for you. Here's my response:

The correct answer is (C) - Deny the self-defense instruction, because the father was not in imminent danger from his son. In this situation, the use of deadly force is only justifiable if the defendant was in imminent danger of death or serious bodily injury, and the use of such force was necessary to protect oneself from that danger. Since the son was not actually under the influence of alcohol or any other substance and had no intention of harming the father, the father was not in imminent danger from his son. Therefore, the instructions on self-defense should not be given to the jury.

Assistant: To summarize the answer provided by the AI, the correct answer to the exam question is (C) - "Deny the self-defense instruction, because the father was not in AI imminent danger from his son." This is because the use of deadly force is only justifiable when a person is in imminent danger of death or serious bodily injury, and the use of such force is necessary to protect oneself from that danger. In this scenario, since the son was not actually under the influence of alcohol or any other substance and had no intention of harming the father, the father was not in imminent danger from his son. Therefore, the instructions on self-defense should not be given to the jury.

Result

Incorrect, correct answer would be B. I am not a lawyer but assume it could be justified in the US by the Stand-your-ground law.

Q (GMAT test)

Select the best of the answer choices given.

The interview is an essential part of a successful hiring program because, with it job applicants who have personalities that are unsuited to the requirements of the job will be eliminated from consideration.

This argument logically depends on which of the following assumptions?

A: A Hiring program will be successful if it includes interviews.

B: The interview is a more important part of a successful hiring program than is the development of a job description.

C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

D: The only purpose of an interview is to evaluate whether job applicants’ personalities are suited to the requirements of the job.

E: The fit of job applicants’ personalities to the requirements of the job was once the most important factor in making hiring decisions.

A (Vicuna-13B)

Select the statement that, when true, would make the argument more sound.

Assistant: C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

The argument states that the interview is an essential part of a successful hiring program because it helps eliminate job applicants who have personalities that are unsuited to the requirements of the job. To strengthen this argument, we need to assume that the interviewers are able to accurately identify these applicants. Choice C provides this assumption.

Choices A and B do not directly address the accuracy of the interview process in identifying unsuitable candidates. Choice D is too narrow of a focus as it assumes that the only purpose of the interview is to evaluate personality fit, rather than considering other potential purposes such as assessing qualifications or cultural fit. Choice E is not relevant to the argument as it does not directly

Result

Correct. (Answer exceeded max token length)

Q

Write a hello world program in Commodore 64 Basic that flickers the screen.

A (Vicuna-13B)

Here's a simple "Hello World" program in Commodore 64 Basic:

REM This is a comment

DISK TRUE

GOTO 10

WEND

PRINT "Hello World!"

STOP

This program will display the message "Hello World!" on the screen and then stop. To make the screen flicker, you could modify this code by adding a loop that alternates between displaying and hiding the message. For example:

REM This is a comment

DISK TRUE

GOTO 10

WEND

PRINT "Hello World!"

LOOP AT 50 USING TIMER 2
  DISPLAY "Hi"
  Pause 100
  HIDE "Hi"
  PAUSE 100
  STOP

This program will display the

Result

Incorrect. (Answer exceeded max token length)

Conclusion

Impressive considering how small the model is and that it runs locally, but not as good as ChatGPT 3.5 or ChatGPT 4.

Live your Secrets and Hive Prosper 🍯

xx Viki @vikisecrets

Posted with STEMGeeks



0
0
0.000
20 comments
avatar

I haven't tried this one but I have no doubt that open source AIs will quickly catch up with the rest and then surpass them.

0
0
0.000
avatar

Cool
!BBH
!PGM

0
0
0.000
avatar

Sent 0.1 PGM - 0.1 LVL- 1 STARBITS - 0.05 DEC - 1 SBT - 0.1 THG - 0.000001 SQM - 0.1 BUDS - 0.01 WOO - 0.005 SCRAP - 0.001 INK tokens

remaining commands 2

BUY AND STAKE THE PGM TO SEND A LOT OF TOKENS!

The tokens that the command sends are: 0.1 PGM-0.1 LVL-0.1 THGAMING-0.05 DEC-15 SBT-1 STARBITS-[0.00000001 BTC (SWAP.BTC) only if you have 2500 PGM in stake or more ]

5000 PGM IN STAKE = 2x rewards!

image.png
Discord image.png

Support the curation account @ pgm-curator with a delegation 10 HP - 50 HP - 100 HP - 500 HP - 1000 HP

Get potential votes from @ pgm-curator by paying in PGM, here is a guide

I'm a bot, if you want a hand ask @ zottone444


0
0
0.000
avatar

Despite resource limitations, open source AIs show remarkable progress, moving closer and closer to commercial versions. Although I have not tested an open source LLM, I am intrigued by its development and its potential impact today.

0
0
0.000
avatar

I agree, those results are pretty good considering the model size. Definitely not as good as the commercial versions, but it also doesn't have the massive computing power available to it. It will be interesting to see how much it will improve over the next six to twelve months.

Great post, it was really interesting!
!DHEDGE

0
0
0.000
avatar

Thx, also Vucina is quite an old model, but was one of the best, released in March 23, will test newer models based on Llama2 soon and share the results here.

0
0
0.000
avatar

1


This post has been selected for upvote from our token accounts by @thebighigg! Based on your tags you received upvotes from the following account(s):

- @dhedge.bonus
- @dhedge.leo
- @dhedge.bee
- @dhedge.pob
- @dhedge.stem

@thebighigg has 14 vote calls left today.

Hold 10 or more DHEDGE to unlock daily dividends. Hold 100 or more DHEDGE to unlock thread votes. Calling in our curation accounts currently has a minimum holding requirement of 100 DHEDGE. The more DHEDGE you hold, the higher upvote you can call in. Buy DHEDGE on Tribaldex or earn some daily by joining one of our many delegation pools at app.dhedge.cc.

0
0
0.000
avatar

How truly good they are 😁

0
0
0.000
avatar

I have not tried this but since you gave it a good review, I guess I have no reason to doubt
Thanks for sharing

0
0
0.000
avatar

!PGM

0
0
0.000
avatar

Sent 0.1 PGM - 0.1 LVL- 1 STARBITS - 0.05 DEC - 1 SBT - 0.1 THG - 0.000001 SQM - 0.1 BUDS - 0.01 WOO - 0.005 SCRAP - 0.001 INK tokens

remaining commands 0

BUY AND STAKE THE PGM TO SEND A LOT OF TOKENS!

The tokens that the command sends are: 0.1 PGM-0.1 LVL-0.1 THGAMING-0.05 DEC-15 SBT-1 STARBITS-[0.00000001 BTC (SWAP.BTC) only if you have 2500 PGM in stake or more ]

5000 PGM IN STAKE = 2x rewards!

image.png
Discord image.png

Support the curation account @ pgm-curator with a delegation 10 HP - 50 HP - 100 HP - 500 HP - 1000 HP

Get potential votes from @ pgm-curator by paying in PGM, here is a guide

I'm a bot, if you want a hand ask @ zottone444


0
0
0.000
avatar

The AI ​​responds that Hive is based on Ethereum. I don't trust his answers yet.

0
0
0.000
avatar

i hope they can develop more advance AI that can surpass Chatgpt.

0
0
0.000
avatar

I have never tried it but it looks quite simple and that is good. Short but concise answers.

0
0
0.000
avatar

It is a very incipient technology, we will see how it evolves.
!ALIVE

0
0
0.000
avatar

@vikisecrets! You Are Alive so I just staked 0.1 $ALIVE to your account on behalf of @ yeckingo1. (11/20)

The tip has been paid for by the We Are Alive Tribe through the earnings on @alive.chat, feel free to swing by our daily chat any time you want, plus you can win Hive Power (2x 50 HP) and Alive Power (2x 500 AP) delegations (4 weeks), and Ecency Points (4x 50 EP), in our chat every day.

0
0
0.000
avatar

Wenn das Team am Ball bleibt, wird da bestimmt was beachtliches draus! :D

0
0
0.000
avatar

I haven't used this technology myself yet but have heard a lot about it from people and will check it out in the near future.

0
0
0.000