SIMD Integration Tutorial for MCHEP C/C++ API
This tutorial explains how to use MCHEP's SIMD (Single Instruction Multiple Data) integration capability to accelerate Monte Carlo integration in C/C++ applications.
Overview
MCHEP's SIMD integration evaluates 4 integration points simultaneously, which can significantly improve performance by relying on a better CPU cache utilization, enabling compiler auto-vectorization, and reducing function call overhead.
Quick Start
Here's a minimal example that integrates a 2D Gaussian:
#include <mchep.hpp>
#include <iostream>
#include <vector>
#include <array>
#include <cmath>
// SIMD integrand: evaluates 4 points at once
std::array<double, 4> gaussian_simd(const std::vector<double>& x) {
std::array<double, 4> results;
const int dim = 2;
const int simd_width = 4;
for (int i = 0; i < simd_width; ++i) {
double sum = 0.0;
for (int d = 0; d < dim; ++d) {
double val = x[d * simd_width + i]; // Note the memory layout!
sum += val * val;
}
results[i] = std::exp(-sum);
}
return results;
}
int main() {
// Integration boundaries: [0, 1] x [0, 1]
std::vector<std::pair<double, double>> boundaries = {
{0.0, 1.0},
{0.0, 1.0}
};
// Create integrator: 10 iterations, 100k evals/iter, 50 bins, alpha=1.5
mchep::Vegas vegas(10, 100000, 50, 1.5, boundaries);
// Run SIMD integration
VegasResult result = vegas.integrate_simd(gaussian_simd);
std::cout << "Result: " << result.value << " +/- " << result.error << std::endl;
return 0;
}
Compile with:
Understanding the SIMD Memory Layout
The key difference between scalar and SIMD integration is the memory layout of input points.
Scalar Layout (Array of Structures - AoS)
SIMD Layout (Structure of Arrays - SoA)
[x0, x1, x2, x3, y0, y1, y2, y3, z0, z1, z2, z3]
|--------------||--------------||--------------|
dimension 0 dimension 1 dimension 2
The formula to access coordinate d of point i is then:
Visual Example (3D, 4 points)
Input vector x (size = dim * 4 = 12):
Index: 0 1 2 3 4 5 6 7 8 9 10 11
[x0 , x1 , x2 , x3 , y0 , y1 , y2 , y3 , z0 , z1 , z2 , z3]
└─────────────────┘ └─────────────────┘ └─────────────────┘
dim 0 dim 1 dim 2
To get point i's coordinates:
Point 0: x[0], x[4], x[8] → (x0, y0, z0)
Point 1: x[1], x[5], x[9] → (x1, y1, z1)
Point 2: x[2], x[6], x[10] → (x2, y2, z2)
Point 3: x[3], x[7], x[11] → (x3, y3, z3)
Complete Example: Physics Integrand
Here's a more realistic example, computing a phase space integral:
#include <mchep.hpp>
#include <iostream>
#include <vector>
#include <array>
#include <cmath>
// Physical constants
constexpr double PI = 3.14159265358979323846;
constexpr double MASS = 1.0;
// SIMD integrand for a 4D phase space integral
// Integrates: exp(-E) * sin(theta) over [0,inf) x [0,pi] x [0,2pi] x [0,inf)
std::array<double, 4> phase_space_simd(const std::vector<double>& x) {
std::array<double, 4> results;
constexpr int dim = 4;
constexpr int simd_width = 4;
for (int i = 0; i < simd_width; ++i) {
// Extract coordinates for point i
double p = x[0 * simd_width + i]; // momentum magnitude [0, 10]
double theta = x[1 * simd_width + i]; // polar angle [0, pi]
double phi = x[2 * simd_width + i]; // azimuthal angle [0, 2pi]
double E = x[3 * simd_width + i]; // energy [0, 10]
// Phase space element with Boltzmann suppression
double jacobian = p * p * std::sin(theta);
double boltzmann = std::exp(-E);
results[i] = jacobian * boltzmann;
}
return results;
}
int main() {
// 4D integration boundaries
std::vector<std::pair<double, double>> boundaries = {
{ 0.0, 10.0 }, // p: momentum
{ 0.0, PI }, // theta: polar angle
{ 0.0, 2*PI }, // phi: azimuthal angle
{ 0.0, 10.0 } // E: energy
};
// Create Vegas integrator
mchep::Vegas vegas(
20, // iterations
200000, // evaluations per iteration
50, // grid bins
1.5, // alpha (grid adaptation speed)
boundaries
);
// Set seed for reproducibility
vegas.set_seed(42);
// Integrate with 0.5% target accuracy
VegasResult result = vegas.integrate_simd(phase_space_simd, 0.5);
std::cout << "Phase space integral:" << std::endl;
std::cout << " Value: " << result.value << std::endl;
std::cout << " Error: " << result.error << std::endl;
std::cout << " Relative error: " << (result.error/result.value)*100 << "%" << std::endl;
std::cout << " Chi2/dof: " << result.chi2_dof << std::endl;
return 0;
}
Converting Scalar to SIMD Integrand
If you have an existing scalar integrand:
// Scalar version
double my_integrand(const std::vector<double>& x) {
double result = 0.0;
for (size_t d = 0; d < x.size(); ++d) {
result += x[d] * x[d];
}
return std::exp(-result);
}
Convert it to SIMD by: (a) changing the return type to std::array<double, 4>, (b) adding
an outer loop over 4 points, and (c) changing the index from x[d] to x[d * 4 + i].
// SIMD version
std::array<double, 4> my_integrand_simd(const std::vector<double>& x) {
std::array<double, 4> results;
const int dim = x.size() / 4; // Total size is dim * 4
for (int i = 0; i < 4; ++i) { // Loop over 4 points
double sum = 0.0;
for (int d = 0; d < dim; ++d) {
double val = x[d * 4 + i]; // Changed indexing
sum += val * val;
}
results[i] = std::exp(-sum);
}
return results;
}
Using VEGAS\(+\) with SIMD
Vegas\(+\) adds adaptive stratified sampling on top of Vegas, which can improve convergence for integrands with localized peaks. It also supports SIMD integration.
Vegas\(+\) Parameters
VegasPlus has two additional parameters compared to Vegas: n_strat: which represents
the number of stratifications per dimension (total hypercubes = n_strat^dim) and
beta which represents the stratification adaptation parameter (typically 0.5-0.75).
Example
#include <mchep.hpp>
#include <iostream>
#include <vector>
#include <array>
#include <cmath>
std::array<double, 4> peaked_integrand_simd(const std::vector<double>& x) {
std::array<double, 4> results;
const int dim = 4;
const int simd_width = 4;
for (int i = 0; i < simd_width; ++i) {
double dx2 = 0.0;
for (int d = 0; d < dim; ++d) {
double val = x[d * simd_width + i];
double diff = val - 0.5;
dx2 += diff * diff;
}
// Narrow Gaussian peak at center
results[i] = std::exp(-100.0 * dx2) * 1013.2118364296088;
}
return results;
}
int main() {
std::vector<std::pair<double, double>> boundaries = {
{0.0, 1.0}, {0.0, 1.0}, {0.0, 1.0}, {0.0, 1.0}
};
// Create VegasPlus integrator
mchep::VegasPlus vegasplus(
10, // n_iter: iterations
100000, // n_eval: evaluations per iteration
50, // n_bins: grid bins
1.5, // alpha: grid adaptation
2, // n_strat: stratifications per dimension (2^4 = 16 hypercubes)
0.75, // beta: stratification adaptation
boundaries
);
vegasplus.set_seed(1234);
// Run SIMD integration
VegasResult result = vegasplus.integrate_simd(peaked_integrand_simd, -1.0);
std::cout << "VegasPlus SIMD Result: " << result.value
<< " +/- " << result.error << std::endl;
return 0;
}
When to Use VEGAS\(+\)
| Integrand Type | Recommended |
|---|---|
| Smooth, no peaks | Vegas |
| Single localized peak | Vegas\(+\) |
| Multiple peaks | Vegas\(+\) with higher n_strat |
| Very high dimensions (>6) | Vegas (stratification overhead grows as n_strat^dim) |
Performance Comparison
For a narrow Gaussian in 4D (1M evaluations):
| Method | Time | Result |
|---|---|---|
| Vegas SIMD | 39 ms | 0.99952 ± 0.00064 |
| Vegas\(+\) SIMD | 37 ms | 1.00017 ± 0.00062 |
Vegas\(+\) achieves slightly better precision for peaked integrands.
Using the C API Directly
If you prefer the C API over the C++ wrapper:
#include <mchep_capi.h>
#include <stdio.h>
#include <math.h>
// C-style SIMD integrand
void my_simd_integrand(const double* x, int dim, void* user_data, double* result) {
for (int i = 0; i < 4; ++i) {
double sum = 0.0;
for (int d = 0; d < dim; ++d) {
double val = x[d * 4 + i];
sum += val * val;
}
result[i] = exp(-sum);
}
}
int main() {
CBoundary boundaries[] = {{0.0, 1.0}, {0.0, 1.0}};
VegasC* vegas = mchep_vegas_new(10, 100000, 50, 1.5, 2, boundaries);
mchep_vegas_set_seed(vegas, 1234);
VegasResult result = mchep_vegas_integrate_simd(
vegas,
my_simd_integrand,
NULL, // user_data (optional)
-1.0 // target_accuracy (-1 = disabled)
);
printf("Result: %f +/- %f\n", result.value, result.error);
mchep_vegas_free(vegas);
return 0;
}
Performance Tips
1. Enable Compiler Optimizations
2. Minimize Branching Inside the Loop
Bad:
for (int i = 0; i < 4; ++i) {
if (x[i] > 0.5) { // Branch inside loop - bad for SIMD
results[i] = func1(x[i]);
} else {
results[i] = func2(x[i]);
}
}
Better:
for (int i = 0; i < 4; ++i) {
// Branchless: compute both, blend result
double v1 = func1(x[d * 4 + i]);
double v2 = func2(x[d * 4 + i]);
double t = (x[d * 4 + i] > 0.5) ? 1.0 : 0.0;
results[i] = t * v1 + (1.0 - t) * v2;
}
3. Use Explicit SIMD Intrinsics (Advanced)
For maximum performance, use CPU intrinsics:
#include <immintrin.h> // AVX
std::array<double, 4> gaussian_avx(const std::vector<double>& x) {
std::array<double, 4> results;
const int dim = x.size() / 4;
__m256d sum = _mm256_setzero_pd();
for (int d = 0; d < dim; ++d) {
__m256d val = _mm256_loadu_pd(&x[d * 4]);
sum = _mm256_fmadd_pd(val, val, sum); // sum += val * val
}
// exp(-sum) - need to extract and compute
_mm256_storeu_pd(results.data(), sum);
for (int i = 0; i < 4; ++i) {
results[i] = std::exp(-results[i]);
}
return results;
}
For a complete heavy HEP-style example with 1000 FMA operations achieving 2-2.5x speedup, see benchmark_simd_avx.cpp.
API Reference
C++ API
// Create Vegas integrator
mchep::Vegas vegas(n_iter, n_eval, n_bins, alpha, boundaries);
// Create VegasPlus integrator (with stratified sampling)
mchep::VegasPlus vegasplus(n_iter, n_eval, n_bins, alpha, n_strat, beta, boundaries);
// SIMD integration (same for both Vegas and VegasPlus)
VegasResult result = vegas.integrate_simd(integrand_func, target_accuracy);
VegasResult result = vegasplus.integrate_simd(integrand_func, target_accuracy);
// Result structure
struct VegasResult {
double value; // Estimated integral
double error; // Statistical error (1 sigma)
double chi2_dof; // Chi-squared per degree of freedom
};
C API
// SIMD integrand signature
typedef void (*CSimdIntegrand)(const double* x, int dim, void* user_data, double* result);
// Vegas SIMD integration
VegasC* mchep_vegas_new(n_iter, n_eval, n_bins, alpha, dim, boundaries);
VegasResult mchep_vegas_integrate_simd(vegas, integrand, user_data, target_accuracy);
void mchep_vegas_free(vegas);
// VegasPlus SIMD integration
VegasPlusC* mchep_vegas_plus_new(n_iter, n_eval, n_bins, alpha, n_strat, beta, dim, boundaries);
VegasResult mchep_vegas_plus_integrate_simd(vegasplus, integrand, user_data, target_accuracy);
void mchep_vegas_plus_free(vegasplus);