Category Archives: OLGA

Pigging Displacement Volume Calculation

One of the most common practices during the operation of pipelines is “pigging”. Buildup of solids or liquids in flowlines can result in plugging, cracks, or flaws in the line, all of which could cause severe damage to the line. Pigging operations may be designed with objectives that include cleaning, inspecting of flowlines, or other maintenance operations. Thus, understanding pigging is necessary for flow assurance engineers to ensure the flow assets continue to run smoothly.

Correctly anticipating the total liquid volumes displaced during pigging operations is very important for operators, as this affects the designing and operations of receiving process equipment, such as separators and slug catchers located at the outlet of pipelines. This discussion will cover how one can run a batch of simulations and analyze the transient liquid response in terms of the pigging displacement volumes and peak liquid flowrates. This analysis can be performed with improved efficiency using evoleap’s flotools software, as compared to traditional methods. We will compare the flotools methodology with more conventional methods using the OLGA GUI and Excel.

Pre-Processing

flotools gives the ability to build a study consisting of many cases and then immediately process the results.  In this discussion we will analyze the pigging displacement volume and peak flowrates in a pipeline system by varying the following parameters:

  • Inlet Liquid Flowrate
  • Watercut
  • Gas Lift Rate
  • Inlet temperature

One of flotools most powerful features, its Parametric Studies tool, makes this process very simple and streamlined.

For a study that contains many cases, manually creating the varying cases using the OLGA GUI can be repetitive and sometimes impractical, especially if it is a particularly large case matrix, as the probability of making errors is high. If the OLGA GUI parametric tool was to be used, for every parameter to be varied, a comma separated list of each value for each individual case would need to be provided. Typically, an Excel spreadsheet would be used to organize and visualize how each induvial case’s parameters would look. The parametric studies tool in flotools simplifies this process. After creating a base case with a specific parameter set at a specific value, flotools allows you to create different cases with that specific parameter varied with different values by providing a comma-separated list of those values. For instance, if the base created for a study has the watercut for the inlet liquid rates set at 0.2 and the study requires the watercut to vary between 0.2 and 0.8 with increments of 0.2, a comma-separated list like “(0.2, 0.4, 0.6, 0.8)”, could be provided and then flotools will generate the differing cases consisting of those values.

Graphical user interface

Description automatically generated
Figure 1 – Parametric study definition example

After defining the study variables, the next important step in generating the cases for the parametric study is defining the naming convention for the numerous cases. flotools provides an easy way to do this by providing an integer index value for every study variable specified while generating the parametric study. An example of the file naming pattern using study variable references is provided in the following figure.

Graphical user interface, text, application, chat or text message, timeline

Description automatically generated
Figure 2 – Interactive naming of cases

As can be seen in the image above, each study variable has an index associated with it, for example watercut (WC) is linked to %3, the third study variable. Using these indexes to name the cases results with a systematic naming convention for the cases which makes each one identifiable. flotools also give you the option to add unlinked variables that can be referenced by other variables. These are potentially useful as they could provide a more descriptive way of naming the cases.

The total number of cases to be generated is indicated as a badge on the generate cases button. Once selected, flotools will create the indicated number of cases in a location specified in your file system. The cases are then available to be ran.

Post-Processing

Post processing the results of the study using the OLGA GUI was time consuming due to limitations with the GUI. When using the OLGA GUI to extract the results, the number of cases that can be loaded into the GUI simultaneously is limited. We found that only 47 of the completed cases could be loaded into the OLGA GUI simultaneously. Also, when extracting trend and profile data, only data at one specific simulation time could be considered; what if we wanted to obtain the maximum value during the simulation instead of the value at the last time step? Overall to get the full data set for this project, 4 separate extractions were required.

However, using flotools made this process much more streamlined. The cases just need to be loaded into a flotools workspace and then the results are available for plotting.

After obtaining the data, the next step with the traditional method is to use Excel to create plots of the pig displacement volumes. For each case, the pig exit time was extracted from the pig position in branch (ZPIG) outputs. Then using Liquid Standard Flowrate (QLST) plots, the displacement volumes were obtained for each case by integrating the QLST between the pig launch and exit time. Then the rest of the data must be formatted appropriately to generate the pivot table.

With flotools, the liquid volume rate can be plotted against the desired parameters using the parametric plots tool. The pig displacement volume, maximum liquid rate and maximum gas rate can all be calculated with flotools using its calculations tool.

A comparison of the two post processing methods can be summarized in the following figure:

Figure 3 – Comparison of workflows between OLGA/Excel and flotools/flowpad

Creating all the cases to run for a study is very fast and streamlined with flotools. Even for a case matrix with over 50 cases, the entire process was done in a few minutes (or less for expert users).

However, using the OLGA GUI and Excel took about approximately 10 times as long to create the desired plots due to the limitations of the OLGA GUI. These limitations include how many cases that could be loaded into the GUI simultaneously, and how much info could be exported at one time. Exporting the data in segments was the rate limiting part of this workflow method.

An example of a desired plot for this project can be seen below:

Chart, line chart, scatter chart

Description automatically generated
Figure 4 – Peak liquid outlet rate vs. Inlet liquid rate

To summarize, a time breakdown of each method is given below:

OLGA/Excel method:

  • Open cases in GUI and export QLST and ZPIG to .csv (11 minutes)
  • Find time pig exits from ZPIG data (2 minutes)
  • Integrate QLST between pig launch and exit times (7 minutes)
  • Generate pivot tables (3 minutes)
  • Format plots (create titles, axis labels, etc.) (10 minutes)
  • Total (34 minutes)

flotools method:

  • Open flotools and load cases (5 minutes)
  • Open parametric study tool and select variables and filters/slices. (2 minutes)
  • Format plots (2 minutes)
  • Duplicate plots and change filters (2 minutes)
  • Total (11 minutes)

The difference in the processing times can be especially crucial if the case matrix needs to be modified and thus the simulations re-run. Using the parametric study tool in flotools, the existing parametric study can be copied over and then modified instead of making a new one from scratch therefore reducing the amount of time that would have potentially been spent on project. Likewise, parametric plots generated to analyze the results of a parametric study, can also be duplicated then slightly varied to cover a variety of different comparisons.

Conclusion

Using flotools for a common flow assurance task like studying pig displacement volumes can result in a much more streamlined process compared to using traditional methods like the OLGA GUI. The efficiency boost with flotools is due to several factors; namely flotools ability to process all case files in a single instance, calculations leveraged within flotools, and the ease of data handling and visualization in flotools.  To expand on these arguments, it should be noted that flotools can handle all the cases in the potential case matrix, even cases that have large data files, whereas the OLGA GUI has a limit for the number of cases it can handle at once. Using flotools also would eliminate the need for further post-processing tools like Excel and ensures repeatability of results when using calculations within flotools.  Finally, the workflows in flotools have been designed specifically for flow assurance engineering applications, thus deriving meaningful results can be quickly achieved based on a complete understanding of both input and output data inherent to the design of flotools.

Efficient Cooldown Parametric Studies

A common flow assurance study objective is calculating the time required to reach hydrate conditions during a shutdown (typically from steady state).  This is often referred to as ‘Cooldown’. The ability for an operator to understand the amount of time that is available to safely complete various tasks during a shutdown, or how close to the hydrate region they can operate a pipeline before running into problems is incredibly valuable.

This discussion will cover how one can run large simulation matrices for cooldown efficiently with flotools and will compare the flotools methodology with traditional methods using the OLGA GUI.

Pre-Processing

In flotools, you can setup and create large numbers of cases for a project, and post-process the results quickly. For this case, we are looking for the cooldown times to the hydrate formation temperature for a system with varying parameters: Gas Lift Rate, Flowrate, Inlet Temperature, Water cut, and different fluids. flotools makes this process very simple and easy using the Parametric Studies tool.

With a large case matrix, it can be time-consuming to create all the varying simulations in OLGA. The flotools Parametric Studies tool can create model files by replacing the text of a base file with new parameters. For example, if you have a boundary node with PRESSURE = 100 psig you can give flotools a list of comma-separated values (100, 150, 200) and flotools will create 3 separate key files that are copies of the base file, one with 100 psig, one with 150 psig, and one with 200 psig at that node. 

The only inputs needed are the variables to be varied, the parameters for each variable are defined only once. Cases can be named interactively, based on the parameters used.

Graphical user interface, text, application, chat or text message

Description automatically generated
Figure 1 – Interactive Naming of Cases

The naming of cases is interactive with the inputs listed above. Each variable has an index associated with it, e.g., WC is associated with %7 (seventh study variable). The index associated with each variable can then be input back into the naming line with the %n operator to have them change with each variation of the parameters. You can even create linked variables that are not used for anything except for naming; for example, the 2nd (%2) and 6th (%6) variables above. Also, the total number of cases to be generated is indicated on the generate cases button with a small green number badge, 360 in this example. This is how many cases we will be generating with these inputs, coming from:

5 flowrates x 3 gas lift rates x 4 temperatures x 2 fluids x 3 water cuts = 360 cases

Once the matrix is set and named appropriately for the parameters of our study, flotools will create the 360 .key files in a location of our choice. flotools can output the .key/model files to any location in the directory structure as well as generate a batch file if necessary. The cases are then ready to run.  

Post-processing

Due to the large number of cases, post-processing of the results was time consuming. When using the OLGA GUI to extract simulation results for many cases you are limited by the number of cases that can be load into the GUI simultaneously. For this project, each case output file was relatively small (.tpl and .ppl file size combined to about 2 MB for each case). We were only able to load 47 of the fully completed simulations into the GUI at one time. When extracting trend and profile data you can only take data from those cases loaded into the GUI at a given time. This project required 8 separate extractions to get the full data set, which consumed a lot of extra time to load cases into the OLGA GUI.

Whereas with the flotools route, you can load the cases into a flotools workspace and be ready for plotting immediately.

Both methods are the setup for plotting and tabulating results. In Excel, to create a plot of cooldown times requires some combination of MATCH and INDEX/OFFSET formulas for each series of the MDTHYD output in the case matrix. Following this, the rest of the data must then be formatted in the right way to form a table/pivot table that you can then use to plot based on different parameters.

In flotools, there are built-in calculations for DTHYD and its (MDTHYD, MDPHYD, MAXDTHYD, MAXDPHYD), which can be used without even having to input a hydrate curve in OLGA. Hydrate curves can be input in flotools during post-processing, which flotools will then reference in comparison to the temperatures and pressures over the entire flow path during the simulation.

A comparison of the methods described above are illustrated in the following figure:

Figure 2 – Comparison of Workflows between OLGA/Excel and flowpad/flotools  

Even for large case matrices, creating all the model files to run is incredibly easy and fast with flotools. Without pre-processing the entire set of cases was created in less than 2 minutes in this instance.

Conversely, post-processing using the OLGA GUI and Excel took roughly 5 times as long to create a finished product of plots or tables. This is mainly due to limitations of the OLGA GUI as to how much info can be stored at one time, i.e., how many cases it can load simultaneously. Having to export the data in chunks was a time-consuming part of the process.

For this set of 360 cases, the goal was to create 11 plots that show the effects of each set of parameters of the parametric study. An example plot created in flotools is show below.

Figure 3 – Cooldown vs. Inlet flowrate for 0% Water cut, 0 MMscf/d Gas lift and 50/50 Mix Fluid

Time breakdowns of each method are given below (these times reflect an experienced user of each method):

OLGA/EXCEL method:

  • Open cases in GUI and export MDTHYD to .csv (22 minutes)
  • Find first MDTHYD > 0 °C (2 minutes)
  • Format all parameters and cooldown times in a table format (5 minutes)
  • Create pivot table, slicers, and pivot charts (7 minutes)
  • Format plots (create titles, axis labels, etc.) (20 minutes for 11 plots)
  • Total (56 minutes)

flotools method:

  • Open flotools and load cases (5 minutes)
  • Open parametric study tool and select variables and filters/slicers. (2 minutes)
  • Change plot formatting to liking (2 minutes)
  • Duplicate plot and change filters/slicers (2 minutes for 11 plots)
  • Total (11 minutes)

This time difference is especially impactful if the case matrix is updated or if simulations are re-run. If a similar parametric study needs to be run, you can simply copy the parametric study in flotools and modify it instead of having to make a completely new one, whereby saving time on future work.

Conclusion

flotools makes running common flow assurance tasks such as cooldowns easy to create, run and report, even if with many cases. There is a significant time savings with flotools as compared to using the OLGA GUI because flotools can handle all the cases from a project at once and can plot them instantly rather than requiring post-processing in Excel.


OLGA Surge Volume Bug?

Find BugWe came across an interesting observation while designing surge volume calculations in flotools that I thought was worth sharing and inviting comments from the flow assurance community. For those that are not 100% sure what surge volume is exactly, let me first explain.

Calculating surge volumes is a routine part of a flow assurance engineer’s work. Operational scenarios like slugging, pigging, production ramp-up in multiphase production systems can all result in large volumes of liquids being swept out of the pipeline and into the first vessel on the receiving facility. Often, these liquid surges come in at rates that far exceed the receiving facility’s capacity to process liquids. Therefore the vessel, typically a slug catcher, acts as a buffer where the surge of liquid can be collected and processed over time. One of the objectives of performing flow assurance studies is to quantify the maximum surge of liquid that can be seen across various operations in order to size the slug catcher appropriately. The maximum volume that a slug catcher will have to hold for a given operation is called the surge volume.

OLGA provides a way to calculate surge volumes whenever at least one of ACCLIQ, ACCOIQ, and ACCWAQ is included in the list of trended outputs. The calculation assumes that the slug catcher is present just downstream of the location where these variables are trended and that the vessel can be drained at a fixed maximum drain rate during the operation.

The calculation performed by OLGA is described by the following equation:
[1]V_{T_{start}}=0
[2]V_{t+1}=\max{\bigg(0,V_t+ACC_{t+1}-ACC_t-Q_{drain}\cdot(T_{t+1}-T_t)\bigg)}
[3]V_{surge}=\max{\big(V_t\big)}\:\text{ where }\:T_{end}\leq t\leq T_{end}
where

ACC_t is the OLGA reported cumulative volume of liquid at time step t,

T_t is the elapsed simulation time at time step t,

Q_{drain} is the maximum drain rate of the slug catcher,

T_{start} and T_{end}  mark the time window in the simulation where the calculation is done, and

V_{surge} is the calculated surge volume

In this post I will look at two interesting properties of this calculation:

  • Why were the accumulation variables (ACC*) used instead of the instantaneous rate variables (QL*)?
  • Why is there a \max{(0,\:\ldots})} operation in equation (2)?

Accumulation Variables vs. Instantaneous Rate Variables

OLGA’s calculation of surge volume uses the accumulated variables as the basis of the surge volume calculation instead of the instantaneous liquid volume rate variables (QLT, QLTHL, QLTWT).  To understand why, let’s look at the instantaneous rate form of the surge volume equation.

When using the instantaneous rate variables, equation (2) becomes,
[4]V_{t+1}=\max{\Bigg(0,V_t+\bigg(\frac{1}{2}\Big(Q_{t+1}+Q_t\Big)-Q_{drain}\bigg)\cdot (T_{t+1}-T_t)\Bigg)}
If we compare the accumulation terms from (2) and (4), we see the following assumed relationship:
[5]ACC_{t+1}-ACC_{t}\simeq \frac{1}{2}\Big(Q_{t+1}+Q_t\Big)\cdot (T_{t+1}-T_t)
In other words, the average of the instantaneous rates in a particular time window is approximately equal to the average accumulation rate in that time window.  This is typically a bad assumption because the instantaneous rates capture rate spikes that are very short in duration and would not be indicative of the average rate for the corresponding time window. The average rate can be calculated as follows:
[6]Q_{avg,t}=\frac{ACC_{t+1}-ACC_t}{T_{t+1}-T_t}
The following chart shows a comparison between an actual QLT output from OLGA and the associated average QLT calculated from ACCLIQ according to equation (6):

 

Instantaneous vs. Average Liquid Rate
Instantaneous vs. Average Liquid Rate

You can see that the average QLT (from ACCLIQ) does not show the flowrate spikes that the QLT variable shows.  These spikes, while they probably do occur in a flowing system, typically occur in very short time windows smaller than the output interval of the simulation. The larger the output interval, the worse the assumption.

The following chart shows an example of the error in accumulation by comparing the calculated accumulation using the rate variable and subtracting the OLGA calculated ACC variable from it. While the maximum error in this example (~25 barrels) is not significant, the magnitude of the error entirely depends on the nature of the simulation and may be significant in some cases.

Error in accumulation calculated using flotools
Error in accumulation

In our view, OLGA has taken the correct approach and used the accumulated variables as the basis for the surge volume calculation.

Handling Negative Terms

Equation (2) features a \max operation. This ensures that the calculated volume in the slug catcher never goes below zero. But what happens when the quantity (ACC_{t+1}-ACC_t) becomes negative?

It is perfectly normal and valid for a numerical simulator to predict negative rates at an outlet boundary. When OLGA predicts negative rates at the outlet of the pipeline, the ACC variable may reduce in value from one time step to the next. When this happens, equation (2) will result in a reduction in the calculated slug catcher volume at a rate faster than the assumed drain rate. Effectively, the calculation does not prevent the possibility that liquid can leave via liquid drain as well as the inlet of the slug catcher. When you look at a schematic of a typical slug catcher, like the one shown below, it becomes apparent that this may not be such a sound assumption. The slug catchers are designed for gravity separation of phases and hence the inlet nozzles are at or near the top of the vessel. Once the liquids go in, they quickly settle to the bottom. Any negative flow is likely to be mostly gas with very little liquid carried as droplets in the gas phase.

Slug catcher schematic
Slug catcher schematic

Depending on your case, using the OLGA basis for calculation may result in significant errors. In one case, we found a 10% error at a specific drain rate. The problem is that the error is not on the side of conservatism. We think the correct way to write equation (2) is as follows:
[7]V_{t+1}=\max{\bigg(0,V_t+\max{\Big(ACC_{t+1}-ACC_t,0\Big)}-Q_{drain}\cdot(T_{t+1}-T_t)\bigg)}
In equation (7), we added another max function that bounds the quantity (ACC_{t+1}-ACC_t), which is the average flow rate into the slug catcher for a given time interval, to zero.

If the calculation is being done at the outlet of a pipeline that is connected to a pressure node, set the parameter GASFRACTION to 1.0 in your NODE specification. This will ensure that whenever there is negative flow at the outlet boundary, the negative flow is all gas. That said, we still think equation (7) is a better way to perform the surge volume calculation because it works well regardless of the boundary specification.

Comparison of Surge Volumes
Comparison of Surge Volumes

The plot above shows a comparison of surge volumes calculated according to equations (2) and (7), labeled “OLGA Method” and “Proposed Method” respectively. We can see that filtering out the negative values results in larger surge volumes at lower drain rates. At large enough drain rates, the differences eventually disappear. Given surge volume calculations are performed in order to size the slug catcher, we believe that equation (2) is not conservative and therefore should not be used. Instead, our modified version represented in equation (7), which gives a more conservative estimate of surge volume, should be used.

As always, your comments and feedback would be much appreciated.

The Truth about OLGA Speed

tunnel-101976_1280

Recently, I saw a discussion of OLGA speed in the OLGA Users group on LinkedIn. The discussion starts with the question of why OLGA performs nearly the same on two different CPUs  (an Intel Core i7 processer running at 3.4 GHz and Intel Core i5 processor also running at 3.4 GHz). This result is surprising and troubling because Core i5 is a considerably cheaper processor.

I have seen flow assurance companies buy expensive hardware in hope of making OLGA go faster. Unfortunately, the results of such expense have been hit-or-miss. As a budding flow assurance consultant, I witnessed one of those misses. After purchasing hardware that was very expensive, we found that OLGA ran just as fast as it was running on desktop machines that were one year old. Since then, I have spent quite of bit of time looking at OLGA speed and working on understanding what factors impact OLGA performance.

To help flow assurance companies considering such buying decisions, I thought it might be worthwhile sharing the knowledge I have gained through my investigation. Also, I thought it might be interesting to add some data and analysis to the discussion and look specifically at how the number of threads plays a role in OLGA speed. In the LinkedIn discussion, Torgeir Vanvik from Schlumberger offered some excellent insight into the way OLGA works, and I am hoping this post sheds more light on the topic of OLGA’s parallel performance.

Key factors that affect OLGA simulation speed

There are a several key factors that affect OLGA simulation speed. Some have to do with the numerical modeling complexity and others have to do with the hardware on which OLGA is run.

On the modeling side, the most obvious factor is the complexity of the network being modeled. In general, single branch models run faster than networks and simple converging networks run faster than diverging networks or networks with looped lines. Unfortunately, this is not something flow assurance engineers can control so it is not worth discussing it further.

Next on the list are the section lengths and numerical time step. In OLGA, the simulation time step is controlled using the parameters MINDT and MAXDT in the INTEGRATION specification and also using the DTCONTROL parameters. To ensure model stability, simulations are typically run with the CFL condition controlling simulation time step. The CFL condition determines how much distance, relative to the length of a section in the model, the fluid in that section is allowed to move in one time step. The net effect is that the longer the section length, the longer your time steps are allowed to be, and vice versa. The INTEGRATION and DTCONTROL parameters along with section lengths have a profound impact on model speed. The model speed is typically governed by the smallest section in the network. I can write a whole treatise on this but that is a topic for another day.

The model speed is typically governed by the smallest section in the network

On the hardware side, the key factors that affect simulation speed are CPU and I/O speed.

The processor

Modern CPUs have two specifications that are important for our purposes – clock speed and number of cores. The clock speed is indicative of how many instructions are processed per second, and the number of cores indicate how many instructions are processed in parallel. Modern versions of OLGA (6 and above) are able to exploit the power of multiple cores whereas older versions of OLGA (5 and below) get no benefit from multi-core processors.

No matter what the version of OLGA, clock speed is important. Ultimately, it comes down to how many instructions can be processed per second, so the GHz of the processor (the bigger the better) is important.

No matter what the version of OLGA, clock speed is important

For OLGA 6 and later versions, the number of cores will also play a role in the speed. However, it is easy to fall into the trap of believing that more cores will result in faster simulation speeds. The unfortunate reality is that some tasks benefit from being processed in parallel while others don’t. If the time to split the task into small problems is greater than the time savings resulting from parallel processing, the task will actually run slower. In other words, depending on the problem there is a theoretical limit to the gains from parallelization. This is also true for OLGA.

To answer practical questions like, “Is it better to have a 3.4 GHz, 4-core CPU or a 2.4 GHz, 16-core CPU?” requires some investigation into OLGA parallelizability. In fact, we explore that very topic later in this post.

Depending on the problem there is a theoretical limit to the gains from parallelization

I/O

Since OLGA outputs simulation results to the disk as it is running, the speed at which it can write out the results can limit (sometimes severely) the run-time speed. There are two common hardware bottlenecks, the hard drive speed (when OLGA is saving locally) and the network bandwidth (when OLGA is writing to a network drive).

Most commercial-grade desktop computers and laptops ship with mechanical hard-drives that spin at 5400 or 7200 rpm, while server-grade machines often come with 10k or 15k rpm drives. The read/write access speed scales directly with the spin speed of the drive. In general, the greater the spin rate the better the hard drive when it comes to OLGA speed. Solid state drives (SSDs) are now also available cheaply and the technology has matured enough to be used in a commercial setting. However, the speed of SSDs range from worse than mechanical drives to exceptionally fast depending on the manufacturer and model. In other words, not all SSDs are as blazing fast as they would have you believe so choose carefully. It is also important to consider the computer bus interface which determines the internal data transfer rates (though these days that interface is rarely the bottle neck). Ultimately, the hard drive performance can be as important to simulation speed as the CPU.

Ultimately, the hard drive performance can be as important to simulation speed as the CPU

When saving OLGA results to a network share, the network can also limit the ability for OLGA to write simulations results. As a result, companies should ensure that the bandwidth between the computer running OLGA and the network storage is as large as possible. This will alleviate any slowdowns in OLGA speed.

When saving OLGA results to a network share, the network can also limit the ability for OLGA to write simulations results

These bottlenecks can also be avoided most of the times by carefully considering the frequency and quantity of simulation outputs.

The study

In order to understand the factors that influence parallel speedup, we used 8 different model configurations.

Model Description Number of sections Smallest section length (m)
1 Single pipeline model  190  173
2 Single branch – fine mesh  7000  11
3 Single branch – coarse mesh  376  21
4 Converging pipeline network  335  17
5 Converging network with pressure-pressure boundary  383  14
6 Converging pipeline network – no flow  335  17
7 Converging-diverging network (Loop)  60  50
8 Two separate networks  426  50

Methodology

All models were run with no trend and profile outputs to eliminate the effect of I/O on parallel speedup. To ensure the results we repeatable, each model was run multiple times utilizing a varying number of threads. A simple program was developed to run each model up to 20 times in 10 minutes (which ensured all models ran at least 2 times and many ran the full 20 times). The average run time was then calculated for each model and thread combination. It is worth noting that the run times for each simulation iteration were nearly identical. OLGA 2014.2 was used for this study (see acknowledgments at the end). The following command was used to manipulate the number of threads used by OLGA. A thread is a part of a computer program that can be managed separately by the operating system. A single core in a modern CPU can handle two threads.

opi.exe /t  <num_threads> <input_file>

All simulations were run on a machine with 4 physical cores and capable of running 8 threads in parallel.

Results

The first plot shows the speedup achieved by the various models. The ideal speedup line shows that a model using n threads should be able to achieve a speedup of ‘n’ compared to the 1 thread model. Note that without specifying the number of cores when running OLGA, the default number of threads is based on the number of CPU cores (in our case that is 4).

Speed-up achieved by various model types
Speed-up achieved by various model types

The plot above shows that the best performing model achieves a speedup of 3 using 4 threads, and a speedup of 4 using 8 threads. The worst performing models cap off at a speedup of ~1.6 and achieve no additional speedup beyond 5 threads. In fact, speedup of few of the models reduce when going from 7 threads to 8. However, this last artifact could be a result of using all available threads on the processor leaving the OS to switch between the computational load and background services running on the OS. We can only confirm this if we ran the test on an 8- or 16-core machine.

Another way to look at speedup is to look at a quantity called parallel efficiency which is the ratio of the actual speedup to the ideal speedup.

Parallel efficiency achieved by various model types
Parallel efficiency achieved by various model types

These two plots show that the parallel speedup tends to stagnate beyond 4 threads for most models. Most models are able to achieve a speedup of 2 or more when using 4 threads. However, by the time we get to 7 threads, only one model has a parallel efficiency of over 50%. In other words, we would be better off running two simulations simultaneously using 4 threads each, rather than running just one simulation using all 8 available threads.

Parallel speedup tends to stagnate beyond 4 threads for most models

Analysis

The parallel speedup and efficiency plots showed that the efficiency of parallelization varied between various model types. So the next question is what makes a model more or less parallelizable. The flow chart below shows a simplified program structure of a parallel program.

Typical program flow of a parallel numerical algorithm such as the one used in OLGA
Typical program flow of a parallel numerical algorithm such as the one used in OLGA

 

In OLGA, the main calculation loop would be the time loop that marches time from start to the end of the simulation. The initial sequential process would be reading input files, tab files, etc. The final post-processing might include closing file handles, releasing memory, etc.

With that background in mind, we curve fitted the parallel efficiency curves with an exponential function of the following form:

\mu_p=e^{c(n_p-1)}

where

\mu_p\text{ is the parallel efficiency}\\<br /><br />
c\text{ is the parallel efficiency decay factor, and}\\<br /><br />
n_p\text{ is the number of threads}

I call the calculated c factor the parallel efficiency decay factor. We can then plot the decay factor as a function of various aspects of the model. Our analysis shows that the decay factor is a strong function of the model runtime and the number of sections in the model.

Parallel Efficiency Decay vs. Model Runtime
Parallel Efficiency Decay vs. Model Runtime

The plot above shows that the parallel efficiency is loosely a logarithmic function of the model run time. This makes sense and follows readily from the way parallel efficiency is formulated above. (and Amdahl’s law). Skipping some math jugglery, c can be rearranged to the following equation:

c=\frac{ln(\frac{t_s+t_p}{n_p\cdot t_s+t_p})}{n_p-1}

where

t_s\text{ is the time spent in the sequential portions of the simulation}\\<br /><br />
t_p\text{ is the time spent in the parallel portions of the simulation}

When t_s\gg t_p, there is hardly any speedup, yielding a parallel efficiency of \frac{1}{n_p} and when t_p\gg t_sc\rightarrow 0 yielding a parallel efficiency of 1. In between, we get a log-linear relationship.

The plot below shows the parallel efficiency decay factor as a function of number of sections in the model. As the number of sections increase, parallel efficiency gets better. Note that at 7000 sections, the decay factor is ~-0.1, which is probably close to the theoretical limit based on the strictly sequential parts of the simulation.

Parallel Efficiency Decay vs. Number of Sections
Parallel Efficiency Decay vs. Number of Sections

This also makes sense based on the fact that the number of computations performed in each time step is directly proportional to the number of sections and these are the computations that are computed in parallel according to the OLGA manual. So, higher the number of sections, better the parallel efficiency. However, there is a limit to the parallel efficiency as there are always sequential parts of the algorithm that cannot be parallelized.

Higher the number of sections, better the parallel efficiency

To sum it up…

Getting back to the discussion of hardware choices and their impact on OLGA speed, number of cores, clock speed and I/O speed are all significant factors. Recent versions of OLGA are multi-threaded and have the ability to run faster by utilizing multiple processor cores. We did a detailed analysis on how number of cores can impact OLGA speed and whether it is prudent to spend money on cores.

OLGA defaults to using as many threads as the number of cores available. In our analysis, the best speedup we achieved with 4 threads was ~3, a 75% parallel efficiency. In general, the more compute intensive the simulation, the better the speedup. For short simulations, multi-threading did not help. Even for a long simulation with 7000 sections, going from 4 threads to 8 threads only bumped up the speedup from 3 to 4. In general the parallel efficiency tapers off as we go beyond 4 threads. Based on our analysis, I reckon that 4 threads is a sweet spot for running flow assurance models in OLGA. You could of course fiddle with this for individual models but I would not recommend spending time on it.

Four threads is a sweet spot for running flow assurance models in OLGA

Keeping in line with our findings, the OLGA manual advises that it is better to use the available cores for simultaneous simulations rather than using them to speed up an individual simulation. However, this advice is a bit naive. For example, most professional desktop or laptop systems today have 4 cores but do not have the hard drive access speeds to support 4 simultaneous simulations writing data. The right choice lies somewhere in the middle.

If you are making a hardware buying decision I would not go beyond 4 cores when buying a computer with a mechanical hard drive. If you have enough OLGA licenses and want to centralize your simulations on one machine, the storage choice is as important as the processor choice. I would also recommend setting OMP_NUM_THREADS environment variable to 4 in order to run OLGA at an optimum parallel efficiency.

We welcome you to share your experiences and provide us feedback. If there is enough interest, we will explore the effect of CPU clock speed and disk I/O in detail in future posts.

Acknowledgments

We thank Dr. Ivor Ellul and RPS Group for running OLGA simulations and for valuable suggestions related to the analysis presented here.