09:01:08 From Ben Kirk to Everyone: Hi all, we hope to see you at today’s monthly meeting!! If you would, please take a few minutes to complete a ‘container’ survey which will help guide discussion later in the session. This should only take a few minutes, and don’t forget to ‘submit’ at the bottom!! https://bit.ly/3RjxqIt 09:07:30 From B.J. Smith to Everyone: GPU tutorials list/links: https://www2.cisl.ucar.edu/what-we-do/training-library/gpu-computing-workshops 09:28:22 From Brian Dobbins to Everyone: I haven't seen a performance penalty with my CESM tests. In fact, in some, I actually benefited (minorly, like 1%, from different MPI optimizations). 09:28:30 From Brian Dobbins to Everyone: This is using Singularity/Apptainer. 09:28:44 From Brian Dobbins to Everyone: I *do* still need to test I/O performance, though. 09:31:58 From Daniel Howard, he/him - CSG, NCAR to Everyone: Here's the container talk Ben just mentioned: https://www.youtube.com/watch?v=FFyXdgWXD3A 09:32:38 From Brian Dobbins to Everyone: I can post my stuff publicly vs privately. Here's the conversation I've had with Ben in chat: 09:32:39 From Brian Dobbins to Everyone: I haven't seen a performance penalty with my CESM tests. In fact, in some, I actually benefited (minorly, like 1%, from different MPI optimizations). 09:32:44 From Brian Dobbins to Everyone: This is using Singularity/Apptainer. 09:32:50 From Brian Dobbins to Everyone: I *do* still need to test I/O performance, though. 09:33:04 From Brian Dobbins to Everyone: (With regards to running on multiple sites): Yes, sort of. I've run the same container on NERSC, AWS and NCAR... but there were definitely a few hiccups due to mapping in the write IB drivers, I believe. This was quite a while ago. I'm hoping that's getting easier over time, but don't know for sure. 09:33:12 From Brian Dobbins to Everyone: The right IB drivers, not write IB drives, obviously. 09:33:24 From Brian Dobbins to Everyone: I also learned, maybe a year ago, that the MPI ABI compatibility isn't a big thing as long as you're JUST running a containerized version - it's all about the launching/mapping of processes. Eg, if you use Intel MPI with PMIX, and OpenMPI with PMIX, that's fine, even though both have a different ABI. This was news to me when I first learned it. 09:34:07 From Brian Vanderwende to Everyone: Singularity is actually documented, but the docs are certainly fertile ground for improvements and updates (CSG is happy to get feedback on that doc…) 09:34:09 From Brian Vanderwende to Everyone: https://arc.ucar.edu/knowledge_base/83853889 09:36:49 From Brian Dobbins to Everyone: I believe EIT (Enterprise IT) is also looking at a small, Kubernetes-like environment for long-running 'services’? 09:37:39 From Brian Dobbins to Everyone: Can EIT systems have read-mount from Glade, out of curiosity? 09:42:38 From Supreeth Madapur to Everyone: I’ve always had trouble configuring GPU aware mpi with singularity and PBS 09:43:14 From Brian Dobbins to Everyone: Ben, this was my understanding too, but it's not fully correct. I have since this run a (simple, mind you) OpenMPI container with Intel MPI on the host, even though the ABIs differ. 09:47:34 From Brian Dobbins to Everyone: That is indeed a problem, yes. Maybe we can argue with them? I had months of discussions with Intel before they agreed to let us include the compilers in containers (with restrictions). 09:47:58 From Brian Vanderwende to Everyone: That's why HPE's Open MPI ABI compatibility will likely be important too, as they are not open with their libraries and thus even on-prem Open MPI solutions are difficult (and we are pushing on it) 09:48:27 From Brian Dobbins to Everyone: I think it depends on HOW different the performance is. 09:48:28 From Supreeth Madapur to Everyone: If I understood the problem right I think UCX works better in this case 09:49:18 From Brian Dobbins to Everyone: I can give you guys a test in the near future, sure. 09:49:48 From Brian Dobbins to Everyone: I tried the HPE one ages ago.. but just switched to running with Intel MPI on Cheyenne. 09:52:19 From Brian Dobbins to Everyone: I'm also looking at switching to OpenMPI in our containers, and if you configure it (and Intel MPI) with PMIX, they both work fine. OpenMPI is just easier to install. :-) 09:52:50 From Supreeth Madapur to Everyone: I can give you guys a test as well 09:52:52 From Brian Vanderwende to Everyone: @Brian - is that a libfabric based Open MPI or UCX-based? 09:52:56 From Brian Vanderwende to Everyone: Or native? 09:53:17 From Brian Dobbins to Everyone: @ Brian: Good question - I forget, it's been a while. I'll try to find it later this week and let you know. I *believe* it was libfabric. 09:53:35 From Brian Dobbins to Everyone: UCX used to be a pain to install. 🙂 09:53:41 From Brian Vanderwende to Everyone: Sounds good. That’s likely the better choice for non-GPU workflows at the moment. 09:56:40 From Brian Dobbins to Everyone: What's interesting is you can build on your own system... then, with Singularity, unpack into a sandbox and *modify* things there. I do that a fair bit. 09:58:57 From Ben Kirk to Everyone: I will post a link to Ben’s full charts after the meeting, in Slack and also on today's Wiki page: 09:59:15 From Ben Kirk to Everyone: https://wiki.ucar.edu/pages/viewpage.action?pageId=505316257 10:00:34 From Brian Dobbins to Everyone: We used Charliecloud to run the Samurai code on Cheyenne. Worked great! 10:01:48 From Brian Dobbins to Everyone: I'd LOVE a tutorial on Podman if you guys have experience; I've been meaning to look into it for ages. 10:03:05 From Brian Dobbins to Everyone: Let's see if there's broader interest - I know some other people in MMM and RAL that were also doing containers and could be interested. 10:03:37 From Brian Dobbins to Everyone: I'd love an NCAR container registry, if it's easy. We pay for DockerHub right now. 10:04:14 From Brian Dobbins to Everyone: Thanks, Ben! 10:04:52 From Brian Dobbins to Everyone: No more questions for me, but happy to chat more soon, sure.