Citizen Science Entry 2

over 2 years ago

Citizen Science Entry 2

Here's my entry for @lemouth's citizen science program, episode 2. Glad to be back :).

Before starting MadGraph, I picked apart the acronyms in the name of the package: 'MadGraph5_aMC@NLO'. Specifically, I wanted to know what the 'aMC' and 'NLO' terms meant. After some searching, here's what I concluded:

aMC: adjoint Monte Carlo.
NLO: Next-to-Leading-Order.

I was most interested in the 'aMC' acronym since 'aMC' appears prominently in the MadGraph5 prompt. So it's gotta be important!

Now for the meat-n-potatoes: simulating 10000 proton-proton collisions where each individual collision yields one top quark and one top antiquark. Following lemouth's lead in episode 2, here's the commands I entered.

$ ./bin/mg5_aMC
MG5_aMC>generate p p > t t~   # Task 1. Defines the collider process of interest.
MG5_aMC>display diagrams      # Task 1. Writes diagrams to /tmp/ in my container.
MG5_aMC>output pp_tt          # Task 2. Set output directory and build Fortran code.
MG5_aMC>launch pp_tt          # Task 3. Run simulation and compute top anti-top production rates.

Note the presence of '#' at the end of the MG5 commands above. Through trial and error, I found that '#' indicates a comment in MadGraph's REPL. For example, generate p p > t t~ # Collider process is a valid command.

An aside on filesystems

Initially I ran the simulation on a shared filesystem (/host/c). That was slow. So I tried both a container-local filesystem and a filesystem mounted in RAM (ramfs). For comparison, I measured the elapsed time of the 'launch' command (minus menu interactions). Here are the results:

Shared filesystem (/host/c/temp/pp_tt): 14m29s
Container filesystem (./pp_tt): 6m31s
RAM filesystem (~/ramfs/pp_tt): 5m47s

The learning: don't use a shared filesystem like I did :(. It was more than 2 times slower. And ramfs helps, but not too significantly. The minor effect of ramfs makes sense as the simulation is likely compute bound.

Since I'm running MadGraph in a container, there is no interactive display session. It's (nearly) headless. So I'm forced to copy graphical results (like images) to the host machine before viewing. That does incur some overhead.

Production Rates and Verification

A screenshot of my final output:

My cross-section is: 505.8 +- 0.8 pb, which yields the same production rate @lemouth determined. Assuming 140/fb has two significant figures.

I verified parton showering, hadronisation, and decay.

That's about it. Looking forward to the next episode and seeing you folks around!

#citizenscience

Posted with STEMGeeks

citizenscience stem stemgeeks

0.000

8 comments

@lemouth 75

about 2 years ago

Thanks a lot for this second report, and congratulations for your hard work on this exercise. How long did it take you? I assume you spent a significant amount of time with the docker environment, didn't you?

This time I have plenty of things to comment on!

Before starting MadGraph, I picked apart the acronyms in the name of the package: 'MadGraph5_aMC@NLO'. Specifically, I wanted to know what the 'aMC' and 'NLO' terms meant. After some searching, here's what I concluded:
aMC: adjoint Monte Carlo.
NLO: Next-to-Leading-Order.

The name comes from the merging of the code MadGraph5 (Mad refers to Madison in the US, and Graph to Feynman diagrams or graphs) and MC@NLO (a Monte Carlo event generator achieving predictions at next-to-leading order accuracy in the strong coupling; more information on this are provided in the 6^th episode and the upcoming 7^th episode). The extra a in the name refers to automation (MadGraph5 was an automated package for predictions at the leading-order accuracy; MC@NLO was not automated) .

By automation, I mean that it is sufficient to specify the process of interest and the physics model, and the code does the rest.

Note the presence of '#' at the end of the MG5 commands above. Through trial and error, I found that '#' indicates a comment in MadGraph's REPL. For example, generate p p > t t~ # Collider process is a valid command.

That’s right. This follows standard Python conventions: anything at the right of the hash is ignored.

An aside on filesystems
Initially I ran the simulation on a shared filesystem (/host/c). That was slow. So I tried both a container-local filesystem and a filesystem mounted in RAM (ramfs). For comparison, I measured the elapsed time of the 'launch' command (minus menu interactions).

That’s interesting. I don’t use dockers (as I run everything locally), so that I am unable to really comment on this. While RAM filesystems seems better, I assume there are limited in disk space, aren’t they? In this case, this may be a weakness.

By the way, why don't you run everything locally (possibly in a virtual environment)?

Since I'm running MadGraph in a container, there is no interactive display session. It's (nearly) headless. So I'm forced to copy graphical results (like images) to the host machine before viewing. That does incur some overhead.

So you don’t have access to an interactive terminal? That’s definitely an overhead as this makes you unable to read error messages and capture them live (if relevant). So you use it more like a cluster on which you would submit a job and recover the output files after they are transferred locally, don’t you? This makes life complicated for testing purpose...

Once again, congratulations for having achieved this episode!

Cheers!

0.000

@iauns 56

about 2 years ago

Thanks!

How long did it take you? I assume you spent a significant amount of time with the docker environment, didn't you?

Most of the container setup time was spent in Episode 1. Now that I have a Dockerfile that specifies how to build the container, I can spin up new instances of the development environment for MadGraph5 quickly. This makes experimenting and iterating with systems-level changes quicker (like playing with filesystems).

The initial procedure of entering commands and checking the results took me a couple hours. Writing the post took me about half a day.

By automation, I mean that it is sufficient to specify the process of interest and the physics model, and the code does the rest.

Thanks for the clarifications. :)

While RAM filesystems seems better, I assume there are limited in disk space, aren’t they? In this case, this may be a weakness.

Exactly, very limited space. Another downside of ramfs is that you need to preallocate the size of the filesystem. And if the system crashes you lose all your data. If you have enough memory, RAM filesystems are pretty good for processes that generate a lot of intermediate artifacts (like compiling a large programs). But otherwise the downsides outweigh the benefits.

So you don’t have access to an interactive terminal? That’s definitely an overhead as this makes you unable to read error messages and capture them live (if relevant). So you use it more like a cluster on which you would submit a job and recover the output files after they are transferred locally, don’t you? This makes life complicated for testing purpose...

Good question. In this case, I don't have access to a graphical display like an X windows or wayland. So no apps like firefox or gimp. But I do have access to an interactive terminal through SSH. And the container runs locally on my computer. So, thankfully, I can see the MadGraph5 process execute in realtime and enter commands as you would normally at a linux terminal.

Thanks for the clarifications and the opportunity to embark on this fun adventure. Looking forward to the next episode!

0.000

@lemouth 75

about 2 years ago

The initial procedure of entering commands and checking the results took me a couple hours. Writing the post took me about half a day.

I can easily imagine that it is also the case for the other participants. Writing the reports takes always more time than the exercises. I however didn't include that when I mentioned that each episode should take a few hours... I actually didn't even think about it. Baah....

Exactly, very limited space. Another downside of ramfs is that you need to preallocate the size of the filesystem. And if the system crashes you lose all your data. If you have enough memory, RAM filesystems are pretty good for processes that generate a lot of intermediate artifacts (like compiling a large programs). But otherwise the downsides outweigh the benefits.

This is what I thought for the RAM filesystem. As for the next-to-next exercises (in a few episodes that I have not written yet), we will need to simulate collisions and store millions of events (which leads to multi-GB intermediate files), I am not sure that this will work. Except of course if the machine is powerful enough. I nevertheless do not know whether it is worth the test.

Good question. In this case, I don't have access to a graphical display like an X windows or wayland. So no apps like firefox or gimp. But I do have access to an interactive terminal through SSH. And the container runs locally on my computer. So, thankfully, I can see the MadGraph5 process execute in realtime and enter commands as you would normally at a linux terminal.

Then it is then perfect. You can probably have access to the HTML output via a browser like links.

Cheers, and thanks again for this report!

0.000

@stemsocial 64

about 2 years ago

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

Thanks for including @stemsocial as a beneficiary, which gives you stronger support.

0.000

@hivebuzz 74

about 2 years ago

Congratulations @iauns! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

	You received more than 1500 upvotes. Your next target is to reach 1750 upvotes.

_{You can view your badges on your board and compare yourself to others in the Ranking}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

Check out the last post from @hivebuzz:

	Our Hive Power Delegations to the September PUM Winners
	Feedback from the October 1st Hive Power Up Day
	Hive Power Up Month Challenge 2022-09 - Winners List