How to do output when running parallel??

The problem: If many processors write to the same file we will end up with a hopeless mess. How should we do output when running in parallel?

There are two solutions - each processor writes to its own file, which it opens and closes. Or the information is saved until the end of the calculation when one of the processors does all output. Thoughts?


For the majority of what I use cloudy for, I run a 1, 2, or 3 dimensional grid and save a set of emission line intensities. The mpi main program in the programs dir off tsuite does just this. (see mpi.cpp) It reads in an arbitrary set of lines and then outputs them from processor 0 at the end of the grid of calculations. This takes care of 99%+ of what I use cloudy for. Some of these grids are HUGE - more than 1000 sims.

The grid command really wants to be parallel,could be made so very quickly, and could easily do everything mpi.cpp does. I usually combine grid with the punch linelist command to read in an arbitrary set of lines (the second file parameter on the punch line) and save the predicted intensities in the file that is the first parameter. With this output option we have all the functionality that the mpi main program has.

Having the one file is very tidy and handy - we go straight to the post processing - plots or integrating over the different spectra.

The multiple files produced in the proposed parallel setup will themselves need a further processing step to bring common information back together. In big runs we will have 1000+ files for each punch output command. There could be close to 10,000 files if we punch several options. We will need to sort out how to bring these files back together into a master file that can then be used in other programs. We might follow a strict naming convention. For instance, all the punch continuum output could go to files with names like continuum.001, continuum.001, etc; or, equivalently, names like file001.con, file002.con etc.

The punch linelist command is a special case. I only need the punch linelist command to save grid output since I mostly want lines. This is enough for most of what I do. The mpi program is only capable of doing what this command does, and I happily used that for more than 10 years. We could malloc space for all the lines in the punch linelist list at startup, including enough space to save the full set of lines for each model in the grid. Each processor would save its spectrum, just as the mpi programs do, then output the entire set of lines at the end. We would have one tidy file which would be all that 99% of the cases would need.

Gary Ferland [2007-02-03]


Either way has it's advantages -- e.g. it's quicker to access random parts of data in multiple files than in a single file of varying length records (maybe this should be 'unless you're using Excel...'), with less log data filling memory and likely to be lost if just one simulation fails. However, some filesystems tuned for HPC work best if there are few, huge files. Using a general API for the output (e.g. a wrapper called cprintf or some such) would make it easy to switch between approaches, perhaps under user control.

As far as the naming of the files, using subdirectories to isolate the output from particular runs (or even processes) would be useful. A meaningful numbering convention rather than just arbitrary index numbers might also help (e.g. replacing %1 in the filenames with the value or grid index of the first variable, etc.).


I am not sure what is meant by "record length". I seem to remember that this is an option for writing binary random access files in fortran, but I don't recall such an option in C++. I doubt it has any relevance for simple ascii files. Anyway, I think the time it takes to grep your way through the grid output is going to be trivial compared to the time it takes to calculate it. I have always been amazed at how efficient grep is, even when working on hundreds of MB of data.

To be helpful to people who write scripts that work on Cloudy output it is important to keep the output format identical for parallel and sequential runs. This also makes documenting easier, etc. This is the way Phymir currently works.

Peter van Hoof [2007-02-05]


It would be useful to be able to intercept fprintf(ioQQQ,...) output. This may be possible via an overload, if the ellipsis doesn't suppress this. However, using vsprintf to format the output isn't safe to buffer overruns, and vsnprintf is a C99 feature which can't be relied on in C++ implementations. Various public-domain snprintf implementations are available, some with suitable licenses: see  http://www.jhweiss.de/software/snprintf.html for one, and links to others.