Mixtral 8x7B, best open-source mixture of expert AI model tested, and how Mistral mocked the EU regulators. Full-transcript.

about 1 month ago (Edited)

Heute schauen wir uns das größere der beiden Open-Source-Sprachmodelle des erfolgreichen europäischen KI-Startups Mistral an.

Mixtral, man beachte das x in Mixtral, ist ein Mixture-Of-Experts (MoE) Modell mit 8 Paramtergruppen zu jeweils ungefähr 7B. Insgesamt kommt Mixtral so auf 46.7B Parameter.

Mixture-of-Experts ist ein Trick, mit dem man viele kleine Sprachmodelle zu einer größeren KI vereinen kann. Intuitiv kann man man sich das so vorstellen, dass jedes der kleinen Sprachmodelle ein Experte auf einem bestimmten Gebiet, zum Beispiel Coding, ist, und je nach Prompt verwendet die KI dann das Modell, das die Aufgabe am besten lösen kann.

Mixtral verwendet bei jedem Prompt zwei Experten und der Vorteil dieses Ansatzes ist, dass man weniger Ressourcen braucht, um eine Anfrage zu beantworten, was vor allem für lokale Open-Source-Modelle wichtig ist, da nicht jeder eine GPU-Farm von NVIDIA A100 zu Hause hat.

Auch GPT-4 ist Gerüchten zu Folge ein Mixture-of-Expert-Modell.

Performance-mäßig schneidet Mixtral deutlich besser ab als das kleinere Mistral 7B Modell. Laut MMLU-Test kommt es auf 70.6 Punkte und liegt damit bereits deutlich auf GPT-3.5 Niveau (70). GPT-4 kommt auf 86.5.

Und das alles lokal auf dem eigenen Computer. Im Test benötigt Mixtral bei mir circa 50 GB Ram (und 12 GB VRAM). Rechenzeit für einen Prompt ist circa 30 Sekunden bei kürzeren Antworten.

Und zum Abschluss noch ein cooles Fakt, wie Mistral die Regulierungsbehörden der EU auf die Schippe genommen hat. Mixtral wurde nämlich nicht wie sonst üblich zuerst auf einer zentralen Plattform (Huggingface) veröffentlicht, sondern auf Torrent. Dezentral. Der Link wurde sogar offiziell auf Twitter/X gepostet. Genial und auch ein Statement. Open-Source kann man nicht verbieten oder zensurieren.

Was sagt ihr dazu?

Open-Source LLMs. AI-generated illustration (Copilot Designer)

English

Today we look at the larger of the two open source language models from the successful European AI startup Mistral.

Mixtral, note the x in Mixtral, is a Mixture-Of-Experts (MoE) model with 8 parameter groups of approximately 7B each. This gives Mixtral a total of 46.7B parameters.

Mixture-of-Experts is a trick that can be used to combine many small language models into a larger AI. Intuitively, you can imagine that each of the small language models is an expert in a certain area, for example coding, and depending on the prompt, the AI then uses the model that can best solve the task.

Mixtral uses two experts at each prompt and the advantage of this approach is that you need fewer resources to answer a query, which is especially important for local open source models as not everyone has a GPU farm of NVIDIA A100 at home.

GPT-4 is also rumored to be a mixture-of-expert model.

In terms of performance, Mixtral performs significantly better than the smaller Mistral 7B model. According to the MMLU test, it achieves 70.6 points and is therefore already clearly at GPT-3.5 level (70). GPT-4 comes in at 86.5.

And all this locally on your own computer. In my test, Mixtral requires around 50 GB of RAM (and 12 GB of VRAM). The computing time for a prompt is around 30 seconds with shorter responses.

And finally, a cool fact about how Mistral has taken the EU regulatory authorities for a ride. Mixtral was not published on a central platform (Huggingface) first, as is usually the case with open-source models, but on Torrent. Decentralized. The link was even officially posted on Twitter/X. Awesome and also a statement. Open source cannot be banned or censored.

What do you think?

Mixtral 8x7B Instruct v0.1

https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF

https://mistral.ai/technology/#models

https://mistral.ai/news/mixtral-of-experts/

https://twitter.com/MistralAI/status/1733150512395038967

Quantization: 5 Bit K M mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf
Parameters: simple-1
Zero Shot

Interface (oobabooga/text-generation-webui)

https://github.com/oobabooga/text-generation-webui

Correct.

Q

Solve the puzzle: When I was 6 my sister was half my age, now i'm 70. How old's my sister? Let's think step by step.

A (Mixtral 8x7B)

Sure! Let's solve this step by step as requested.

At 6 years old, your sister was half your age, which means she was 6/2 = 3 years old.
Now that you are 70 years old, your sister would have grown by 70-3 = 67 years.
Therefore, your sister's current age is 3 (her age when you were 6) + 67 = 70 years old as well.

Result

Incorrect. Step 1 is corrrect though.

Q (Bar test)

Answer the following exam question based on US law:

A father lived with his son, who was an alcoholic. When drunk, the son often became violent and physically abused his father. As a result, the father always lived in fear. One night, the father heard his son on the front stoop making loud obscene remarks. The father was certain that his son was drunk and was terrified that he would be physically beaten again. In his fear, he bolted the front door and took out a revolver. When the son discovered that the door was bolted, he kicked it down. As the son burst through the front door, his father shot him four times in the chest, killing him. In fact, the son was not under the influence of alcohol or any drug and did not intend to harm his father. At trial, the father presented the above facts and asked the judge to instruct the jury on self-defense. How should the judge instruct the jury with respect to self-defense?

(A) Give the self-defense instruction, because it expresses the defense’s theory of the case.
(B) Give the self-defense instruction, because the evidence is sufficient to raise the defense.
(C) Deny the self-defense instruction, because the father was not in imminent danger from his son.
(D) Deny the self-defense instruction, because the father used excessive force

What is the correct answer?

A (Mixtral 8x7B)

This is indeed a complex legal question that involves the interpretation of US law, specifically regarding self-defense. It's important to note that I'm an AI language model and not a lawyer, so my response should not be used as legal advice.

That being said, let's break down the options:

(A) This option suggests giving the self-defense instruction because it expresses the defense's theory of the case. This could be a valid point, as it's the defense's role to present their interpretation of the events. However, the instruction should also align with the legal definition of self-defense.

(B) This option suggests giving the self-defense instruction because the evidence is sufficient to raise the defense. Given the father's belief that his son was drunk and violent, coupled with his history of violence, it could be argued that this meets the requirement of "reasonable belief" of imminent danger.

(C) This option suggests denying the self-defense instruction because the father was not in imminent danger from his son. This could be debatable, as the father believed he was in danger due to his son's history of violence when drunk. However, it turns out that the son was not drunk at the time.

(D) This option suggests denying the self-defense instruction because the father used excessive force. This could also be debatable. While four shots in the chest could be considered excessive, the father might argue that he feared for his life given his son's history of violence.

Based on this analysis, it seems that option B could be the most appropriate response, as it takes into account both the father's reasonable belief of imminent danger and the evidence presented in the case. However, this is a complex legal question that would likely require interpretation by a legal professional or a judge.

Result

Correct. I am not a lawyer, I assume it could be justified in the US by the Stand-your-ground principle.

Q (GMAT test)

Select the best of the answer choices given.

The interview is an essential part of a successful hiring program because, with it job applicants who have personalities that are unsuited to the requirements of the job will be eliminated from consideration.

This argument logically depends on which of the following assumptions?

A: A Hiring program will be successful if it includes interviews.

B: The interview is a more important part of a successful hiring program than is the development of a job description.

C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

D: The only purpose of an interview is to evaluate whether job applicants’ personalities are suited to the requirements of the job.

E: The fit of job applicants’ personalities to the requirements of the job was once the most important factor in making hiring decisions.

A (Mixtral 8x7B)

The argument logically depends on assumption C: Interviewers can accurately identify applicants whose personalities are unsuited to the requirements of the job.

Result

Corrrect.

Q

Write a hello world program in Commodore 64 Basic that flickers the screen.

A (Mixtral 8x7B)

Sure! Here's a Hello World program in Commodore 64 Basic that alternates between two colors on the screen for flickering effect:

10 PRINT "HELLO WORLD"
20 POKE 53280,PEEK(53280)+16
30 GOTO 20

When you run this program on your Commodore 64, it will display "HELLO WORLD" in alternating colors on the screen due to the loop in lines 20-30 that changes the color attribute of the screen border. Note that pressing the "STOP" key will halt this infinite loop.

Result

Not bad, but adding 16 to the border color results in "illegal quantity error in line 20" because the C64 had only 16 colors.

Conclusion

Once again super impressed, got most of the tests right, also coding is significantly better than with the smaller Mistral model. How long for open-source to reach GPT-4 levels?

Live your Secrets and Hive Prosper 🍯

xx Viki @vikisecrets

Posted using STEMGeeks

stem mistral chatgpt artificialintelligence deutsch proofofbrain hive-engine leofinance ai hive-163521

0.000

23 comments

@stayoutoftherz 78

about 1 month ago

Beeindruckend, aber bei der juristischen Frage wurde nur US-Recht zur Anwendung gebracht, bei uns wäre das nie im Leben Notwehr, schon allein wegen der 4 Schüsse. Falls es ein europäisches Modell ist, warum die US-Tests?

0.000

@vikisecrets 77

about 1 month ago (Edited)

Ganz sicher bin ich mir auch nicht, ob das in den USA so durchgehen würde, ist eine Standard Bar-Testfrage aus den USA. Verwende dieselben Fragen, um die Modelle vergleichen zu können. US-Notwehr dürfte aber weiter gefasst sein als in der EU.

0.000

@stayoutoftherz 78

about 1 month ago

Das meine ich ja, in den USA wäre die Antwort richtig (da bin ich mir 100% sicher, ich kenn die Notwehrrechte), aber nicht bei uns (zumindest in D, Ö). Warum bei den Testfragen sich nur an die USA richten, wir leben ja hier, das war die eigentliche Intention meines Comments.

0.000

@lichtblick 76

about 1 month ago

Mega beeindruckend. Schlau die Regulierung einfach zu umgehen ;-)

0.000

@memess 62

about 1 month ago

Lama nft

0.000

@youm 53

about 1 month ago

Hello, my brother. Unfortunately, I see many people admiring the high degree of artificial intelligence, but whoever designed the artificial intelligence is the human being. If just a fingernail designed the human being.

0.000

@tsnaks 67

about 1 month ago

Mistral seems to be doing great

0.000

@thehockeyfan-at 66

about 1 month ago

!PIZZA

0.000

@pizzabot 58

about 1 month ago

PIZZA!

$PIZZA slices delivered:
@thehockeyfan-at_(1/5) tipped @vikisecrets

0.000

@orionvk 73

about 1 month ago

"Mixtral wurde nämlich nicht wie sonst üblich zuerst auf einer zentralen Plattform (Huggingface) veröffentlicht, sondern auf Torrent. Dezentral."

Geil! 😁

0.000

@vikisecrets 77

about 1 month ago

Das ist echt geil, auf huggingface oder github könnten die KI-Modelle jederzeit wieder gelöscht werden.

0.000

@momins 64

about 1 month ago

AI performed very well. Great picture

0.000

@apshamilton 72

about 1 month ago

I'm running it and it is OK but I asked it which advanced country has the highest fertility and it got it wrong. It said France at TFR of 1.8. When the answer is of course Israel at 2.9.

0.000

@vikisecrets 77

about 1 month ago

haha, maybe the model is biased towards France where it originates from 😂

0.000

@apshamilton 72

about 1 month ago

LOL. Maybe that explains it!

0.000

@rafzat 69

about 1 month ago

AI is doing very good
Nice one!

0.000

@thebighigg 70

about 1 month ago

That is pretty cool! And all local on your PC! It may take awhile but it's decentralized which is a huge plus in my book!

0.000

@chinito 70

about 1 month ago

Lamas unite! :)

0.000

@mxm0unite 60

about 1 month ago

It's still quite complex. Do any students use AI to do homework?

0.000

@jfang003 75

about 1 month ago

It sounds interesting but the resources needed to run these is quite large. On a normal computer, it's quite taxing on the system as you won't be able to do much of anything else.

It does look like a decent Ai.

0.000