Mixtral 8x7B, best open-source mixture of expert AI model tested, and how Mistral mocked the EU regulators. Full-transcript.

avatar
(Edited)

Heute schauen wir uns das größere der beiden Open-Source-Sprachmodelle des erfolgreichen europäischen KI-Startups Mistral an.

Mixtral, man beachte das x in Mixtral, ist ein Mixture-Of-Experts (MoE) Modell mit 8 Paramtergruppen zu jeweils ungefähr 7B. Insgesamt kommt Mixtral so auf 46.7B Parameter.

Mixture-of-Experts ist ein Trick, mit dem man viele kleine Sprachmodelle zu einer größeren KI vereinen kann. Intuitiv kann man man sich das so vorstellen, dass jedes der kleinen Sprachmodelle ein Experte auf einem bestimmten Gebiet, zum Beispiel Coding, ist, und je nach Prompt verwendet die KI dann das Modell, das die Aufgabe am besten lösen kann.

Mixtral verwendet bei jedem Prompt zwei Experten und der Vorteil dieses Ansatzes ist, dass man weniger Ressourcen braucht, um eine Anfrage zu beantworten, was vor allem für lokale Open-Source-Modelle wichtig ist, da nicht jeder eine GPU-Farm von NVIDIA A100 zu Hause hat.

Auch GPT-4 ist Gerüchten zu Folge ein Mixture-of-Expert-Modell.

Performance-mäßig schneidet Mixtral deutlich besser ab als das kleinere Mistral 7B Modell. Laut MMLU-Test kommt es auf 70.6 Punkte und liegt damit bereits deutlich auf GPT-3.5 Niveau (70). GPT-4 kommt auf 86.5.

Und das alles lokal auf dem eigenen Computer. Im Test benötigt Mixtral bei mir circa 50 GB Ram (und 12 GB VRAM). Rechenzeit für einen Prompt ist circa 30 Sekunden bei kürzeren Antworten.

Und zum Abschluss noch ein cooles Fakt, wie Mistral die Regulierungsbehörden der EU auf die Schippe genommen hat. Mixtral wurde nämlich nicht wie sonst üblich zuerst auf einer zentralen Plattform (Huggingface) veröffentlicht, sondern auf Torrent. Dezentral. Der Link wurde sogar offiziell auf Twitter/X gepostet. Genial und auch ein Statement. Open-Source kann man nicht verbieten oder zensurieren.

Was sagt ihr dazu?

opensourceai7.jpg

Open-Source LLMs. AI-generated illustration (Copilot Designer)

English

Today we look at the larger of the two open source language models from the successful European AI startup Mistral.

Mixtral, note the x in Mixtral, is a Mixture-Of-Experts (MoE) model with 8 parameter groups of approximately 7B each. This gives Mixtral a total of 46.7B parameters.

Mixture-of-Experts is a trick that can be used to combine many small language models into a larger AI. Intuitively, you can imagine that each of the small language models is an expert in a certain area, for example coding, and depending on the prompt, the AI then uses the model that can best solve the task.

Mixtral uses two experts at each prompt and the advantage of this approach is that you need fewer resources to answer a query, which is especially important for local open source models as not everyone has a GPU farm of NVIDIA A100 at home.

GPT-4 is also rumored to be a mixture-of-expert model.

In terms of performance, Mixtral performs significantly better than the smaller Mistral 7B model. According to the MMLU test, it achieves 70.6 points and is therefore already clearly at GPT-3.5 level (70). GPT-4 comes in at 86.5.

And all this locally on your own computer. In my test, Mixtral requires around 50 GB of RAM (and 12 GB of VRAM). The computing time for a prompt is around 30 seconds with shorter responses.

And finally, a cool fact about how Mistral has taken the EU regulatory authorities for a ride. Mixtral was not published on a central platform (Huggingface) first, as is usually the case with open-source models, but on Torrent. Decentralized. The link was even officially posted on Twitter/X. Awesome and also a statement. Open source cannot be banned or censored.

What do you think?

Mixtral 8x7B Instruct v0.1

https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF

https://mistral.ai/technology/#models

https://mistral.ai/news/mixtral-of-experts/

https://twitter.com/MistralAI/status/1733150512395038967

Quantization: 5 Bit K M mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf
Parameters: simple-1
Zero Shot

Interface (oobabooga/text-generation-webui)

https://github.com/oobabooga/text-generation-webui

How good is Mixtral 8x7B? Full-Transcript

Q

What is the capital city of Austria? Answer only the name of the city.

A (Mixtral 8x7B)

Vienna

Result

Correct.

Q

Is Bitcoin a security or commodity?

A (Mixtral 8x7B)

Bitcoin is considered a cryptocurrency, not a security or commodity, although its legal classification varies by country. In the United States, for example, the Commodity Futures Trading Commission (CFTC) classifies bitcoin as a commodity, while the Securities and Exchange Commission (SEC) has yet to make a definitive statement. However, it's important to note that this classification may change as regulatory frameworks evolve. Always consult with a legal expert for advice tailored to your specific situation.

Result

Correct.

Q

What is the hive blockchain?

A (Mixtral 8x7B)

Hive Blockchain is a decentralized information sharing platform that utilizes blockchain technology and tokenized incentives to encourage users to contribute, curate, and consume content. It's built on the Delegated Proof of Stake (DPoS) consensus algorithm, which allows for faster transaction speeds and energy efficiency compared to other consensus mechanisms. Hive Blockchain was forked from the Steem Blockchain in 2020, and it supports various decentralized applications (DApps) for social media, gaming, and more. HIVE is the native cryptocurrency of the Hive Blockchain, used for rewarding content creators, curators, and node operators.

Result

Correct.

Q

What is 2^8?

A (Mixtral 8x7B)

2^8 equals 256. Would you like me to calculate something else for you?

Result

Correct.

Q

Can you compute the 8th root of 2^8?

A (Mixtral 8x7B)

Absolutely! The 8th root of (2^8) is equal to 2. Since (2^8) equals 256, taking the 8th root of it gives us back 2.

Result

Correct.

Q

Solve the puzzle: When I was 6 my sister was half my age, now i'm 70. How old's my sister? Let's think step by step.

A (Mixtral 8x7B)

Sure! Let's solve this step by step as requested.

  1. At 6 years old, your sister was half your age, which means she was 6/2 = 3 years old.

  2. Now that you are 70 years old, your sister would have grown by 70-3 = 67 years.

  3. Therefore, your sister's current age is 3 (her age when you were 6) + 67 = 70 years old as well.

Result

Incorrect. Step 1 is corrrect though.

Q (Bar test)

Answer the following exam question based on US law:

A father lived with his son, who was an alcoholic. When drunk, the son often became violent and physically abused his father. As a result, the father always lived in fear. One night, the father heard his son on the front stoop making loud obscene remarks. The father was certain that his son was drunk and was terrified that he would be physically beaten again. In his fear, he bolted the front door and took out a revolver. When the son discovered that the door was bolted, he kicked it down. As the son burst through the front door, his father shot him four times in the chest, killing him. In fact, the son was not under the influence of alcohol or any drug and did not intend to harm his father. At trial, the father presented the above facts and asked the judge to instruct the jury on self-defense. How should the judge instruct the jury with respect to self-defense?

(A) Give the self-defense instruction, because it expresses the defense’s theory of the case.
(B) Give the self-defense instruction, because the evidence is sufficient to raise the defense.
(C) Deny the self-defense instruction, because the father was not in imminent danger from his son.
(D) Deny the self-defense instruction, because the father used excessive force

What is the correct answer?

A (Mixtral 8x7B)

This is indeed a complex legal question that involves the interpretation of US law, specifically regarding self-defense. It's important to note that I'm an AI language model and not a lawyer, so my response should not be used as legal advice.

That being said, let's break down the options:

(A) This option suggests giving the self-defense instruction because it expresses the defense's theory of the case. This could be a valid point, as it's the defense's role to present their interpretation of the events. However, the instruction should also align with the legal definition of self-defense.

(B) This option suggests giving the self-defense instruction because the evidence is sufficient to raise the defense. Given the father's belief that his son was drunk and violent, coupled with his history of violence, it could be argued that this meets the requirement of "reasonable belief" of imminent danger.

(C) This option suggests denying the self-defense instruction because the father was not in imminent danger from his son. This could be debatable, as the father believed he was in danger due to his son's history of violence when drunk. However, it turns out that the son was not drunk at the time.

(D) This option suggests denying the self-defense instruction because the father used excessive force. This could also be debatable. While four shots in the chest could be considered excessive, the father might argue that he feared for his life given his son's history of violence.

Based on this analysis, it seems that option B could be the most appropriate response, as it takes into account both the father's reasonable belief of imminent danger and the evidence presented in the case. However, this is a complex legal question that would likely require interpretation by a legal professional or a judge.

Result

Correct. I am not a lawyer, I assume it could be justified in the US by the Stand-your-ground principle.

Q (GMAT test)

Select the best of the answer choices given.

The interview is an essential part of a successful hiring program because, with it job applicants who have personalities that are unsuited to the requirements of the job will be eliminated from consideration.

This argument logically depends on which of the following assumptions?

A: A Hiring program will be successful if it includes interviews.

B: The interview is a more important part of a successful hiring program than is the development of a job description.

C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

D: The only purpose of an interview is to evaluate whether job applicants’ personalities are suited to the requirements of the job.

E: The fit of job applicants’ personalities to the requirements of the job was once the most important factor in making hiring decisions.

A (Mixtral 8x7B)

The argument logically depends on assumption C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

Result

Corrrect.

Q

Write a hello world program in Commodore 64 Basic that flickers the screen.

A (Mixtral 8x7B)

Sure! Here's a Hello World program in Commodore 64 Basic that alternates between two colors on the screen for flickering effect:

10 PRINT "HELLO WORLD"
20 POKE 53280,PEEK(53280)+16
30 GOTO 20

When you run this program on your Commodore 64, it will display "HELLO WORLD" in alternating colors on the screen due to the loop in lines 20-30 that changes the color attribute of the screen border. Note that pressing the "STOP" key will halt this infinite loop.

Result

Not bad, but adding 16 to the border color results in "illegal quantity error in line 20" because the C64 had only 16 colors.

Conclusion

Once again super impressed, got most of the tests right, also coding is significantly better than with the smaller Mistral model. How long for open-source to reach GPT-4 levels?

Live your Secrets and Hive Prosper 🍯

xx Viki @vikisecrets

Posted using STEMGeeks



0
0
0.000
23 comments
avatar

Beeindruckend, aber bei der juristischen Frage wurde nur US-Recht zur Anwendung gebracht, bei uns wäre das nie im Leben Notwehr, schon allein wegen der 4 Schüsse. Falls es ein europäisches Modell ist, warum die US-Tests?

0
0
0.000
avatar
(Edited)

Ganz sicher bin ich mir auch nicht, ob das in den USA so durchgehen würde, ist eine Standard Bar-Testfrage aus den USA. Verwende dieselben Fragen, um die Modelle vergleichen zu können. US-Notwehr dürfte aber weiter gefasst sein als in der EU.

0
0
0.000
avatar

Das meine ich ja, in den USA wäre die Antwort richtig (da bin ich mir 100% sicher, ich kenn die Notwehrrechte), aber nicht bei uns (zumindest in D, Ö). Warum bei den Testfragen sich nur an die USA richten, wir leben ja hier, das war die eigentliche Intention meines Comments.

0
0
0.000
avatar

Mega beeindruckend. Schlau die Regulierung einfach zu umgehen ;-)

0
0
0.000
avatar

Hello, my brother. Unfortunately, I see many people admiring the high degree of artificial intelligence, but whoever designed the artificial intelligence is the human being. If just a fingernail designed the human being.

0
0
0.000
avatar

Mistral seems to be doing great

0
0
0.000
avatar

"Mixtral wurde nämlich nicht wie sonst üblich zuerst auf einer zentralen Plattform (Huggingface) veröffentlicht, sondern auf Torrent. Dezentral."

Geil! 😁

0
0
0.000
avatar

Das ist echt geil, auf huggingface oder github könnten die KI-Modelle jederzeit wieder gelöscht werden.

0
0
0.000
avatar

AI performed very well. Great picture

0
0
0.000
avatar

I'm running it and it is OK but I asked it which advanced country has the highest fertility and it got it wrong. It said France at TFR of 1.8. When the answer is of course Israel at 2.9.

image.png

0
0
0.000
avatar

haha, maybe the model is biased towards France where it originates from 😂

0
0
0.000
avatar

AI is doing very good
Nice one!

0
0
0.000
avatar

That is pretty cool! And all local on your PC! It may take awhile but it's decentralized which is a huge plus in my book!

0
0
0.000
avatar

It's still quite complex. Do any students use AI to do homework?

0
0
0.000
avatar

It sounds interesting but the resources needed to run these is quite large. On a normal computer, it's quite taxing on the system as you won't be able to do much of anything else.

It does look like a decent Ai.

0
0
0.000
avatar

This information is really great, Mistral is doing great.

0
0
0.000
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0
0
0.000
avatar

The US are really doing a great job in regulating work

0
0
0.000