Skip to content

Commit

Permalink
Improve Tree impl details and vector abstraction
Browse files Browse the repository at this point in the history
  • Loading branch information
Canleskis committed Jun 13, 2023
1 parent 96a97c1 commit d04a748
Show file tree
Hide file tree
Showing 12 changed files with 761 additions and 743 deletions.
25 changes: 14 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,36 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased - 2023-01-06
## Unreleased - 2023-13-06

### Added

- `Storage` trait for inputs of `ComputeMethod::compute`.
- `Compute` trait extending `Iterator` with a `compute` method used by `Accelerations` and `MapAccelerations`.
- `algorithms` module.
- `sequential::BruteForceAlt` compute method, slower `sequential::BruteForce` alternative not iterating over the combinations of pair of particles (but more flexible by using `FromMassive`).
- `sequential::BruteForce` compute method, slower `sequential::BruteForceCombinations` alternative not iterating over the combinations of pair of particles (but more flexible by using `FromMassive`).
- `parallel::BruteForceSIMD` and `sequential::BruteForceSIMD` compute methods making use of explicit SIMD instructions for major performance benefits on compatible platforms using [ultraviolet](https://github.com/fu5ha/ultraviolet).
- `Scalar` and `InternalVector` traits to help genericity of built-in non-SIMD compute methods.
- `IntoSIMDElement`, `SIMD`, `SIMDScalar`, `SIMDVector` and `ReduceAdd` traits to help genericity of built-in SIMD compute methods.
- `FromMassive` and `ParticleSet` structs implementing `Storage` backing non-SIMD compute methods.
- `FromMassiveSIMD` struct implementing `Storage` backing SIMD compute methods.
- `internal::Scalar` and `internal::Vector` traits to help genericity of built-in non-SIMD compute methods.
- `simd::IntoVectorElement`, `simd::SIMD`, `simd::Scalar`, `simd::Vector` and `simd::ReduceAdd` traits to help genericity of built-in SIMD compute methods.
- `MassiveAffected` and `ParticleSet` structs implementing `Storage` backing non-SIMD compute methods.
- `MassiveAffectedSIMD` struct implementing `Storage` backing SIMD compute methods.
- `PointMass` struct representing a particle for built-in storages.
- `Tree`, `Node`, `Orthant`, `BoundingBox` structs and `TreeData`, `BoundingBoxDivide`, `Positionable`, `BarnesHutTree` traits backing BarnesHut compute methods.
- `Tree`, `Node`, `SizedOrthant`, `BoundingBox` structs and `TreeData`, `Subdivide`, `Positionable`, `BarnesHutTree` traits backing BarnesHut compute methods.

### Changed

- Available compute methods moved to `algorithms` module.
- Built-in `ComputeMethod` implementations use `Scalar` and `InternalVector` traits.
- `ComputeMethod` generic over a storage and output type.
- `ComputeMethod::compute` expects a storage type.
- `ComputeMethod` has `Output` associated type returned by `ComputeMethod::compute`.
- `Compute`, `MapCompute` traits renamed to `Accelerations` and `MapAccelerations` and no longer generic.
- `Vector` trait renamed to `IntoInternalVector` and its associated type to `InternalVector`.
- `Vector` trait renamed to `IntoVectorArray` and its associated type to `Vector` and moved to `internal` submodule.
- `InternalVector` trait renamed to `Vector` and moved with `Scalar` trait to `internal` submodule.
- `compute_method` no longer glob imported in prelude.
- `tree` module and submodules made public.
- `vector` made public and submodule of `algorithms`.
- `vector` module made public and part of `algorithms`.
- renamed `sequential::BruteForce` to `sequential::BruteForceCombinations`.
- Built-in compute methods moved to `algorithms` module.
- Built-in `ComputeMethod` implementations use `internal::Scalar` and `internal::Vector` traits.

### Removed

Expand Down
5 changes: 3 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ gpu = ["dep:wgpu", "dep:bytemuck", "dep:pollster"]

[dependencies]
particular_derive = { path = "particular_derive", version = "0.5.0" }
glam = "0"
wide = "0"

ultraviolet = { version = "0", features = ["f64"] }
wide = "0"
glam = "0"

rayon = { version = "1", optional = true }

Expand Down
7 changes: 4 additions & 3 deletions benches/benchmark.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,15 @@ fn criterion_benchmark(c: &mut Criterion) {
{
bench_compute_method(&bodies, &mut group, parallel::BruteForce);
bench_compute_method(&bodies, &mut group, parallel::BruteForceSIMD);
bench_compute_method(&bodies, &mut group, parallel::BarnesHut { theta: 1.0 });
bench_compute_method(&bodies, &mut group, parallel::BarnesHut { theta: 0.5 });
}

{
bench_compute_method(&bodies, &mut group, sequential::BruteForce);
bench_compute_method(&bodies, &mut group, sequential::BruteForceAlt);
bench_compute_method(&bodies, &mut group, sequential::BruteForceCombinations);
bench_compute_method(&bodies, &mut group, sequential::BruteForceCombinationsAlt);
bench_compute_method(&bodies, &mut group, sequential::BruteForceSIMD);
bench_compute_method(&bodies, &mut group, sequential::BarnesHut { theta: 1.0 });
bench_compute_method(&bodies, &mut group, sequential::BarnesHut { theta: 0.5 });
}
}

Expand Down
40 changes: 22 additions & 18 deletions src/algorithms/gpu.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
use crate::{
algorithms::{
internal,
wgpu_data::{setup_wgpu, WgpuData},
FromMassive, PointMass, Scalar,
MassiveAffected, PointMass,
},
compute_method::{ComputeMethod, Storage},
};

const PARTICLE_SIZE: u64 = std::mem::size_of::<PointMass<[f32; 3], f32>>() as u64;

/// A brute-force [`ComputeMethod`] using the GPU with [wgpu](https://github.com/gfx-rs/wgpu).
/// Brute-force [`ComputeMethod`] using the GPU with [wgpu](https://github.com/gfx-rs/wgpu).
///
/// This struct should not be recreated every iteration for performance reasons as it holds initialized data used by WGPU for computing on the GPU.
///
Expand All @@ -19,19 +20,23 @@ pub struct BruteForce {
queue: ::wgpu::Queue,
}

impl<V> ComputeMethod<FromMassive<[f32; 3], f32>, V> for &mut BruteForce
impl<V> ComputeMethod<MassiveAffected<[f32; 3], f32>, V> for &mut BruteForce
where
V: From<[f32; 3]> + 'static,
V: From<[f32; 3]>,
{
type Output = Box<dyn Iterator<Item = V>>;
type Output = Vec<V>;

#[inline]
fn compute(self, storage: FromMassive<[f32; 3], f32>) -> Self::Output {
fn compute(self, storage: MassiveAffected<[f32; 3], f32>) -> Self::Output {
let particles_len = storage.affected.len() as u64;
let massive_len = storage.massive.len() as u64;

if massive_len == 0 {
return Box::new(storage.affected.into_iter().map(|_| V::from([0.0; 3])));
return storage
.affected
.into_iter()
.map(|_| V::from([0.0; 3]))
.collect();
}

if let Some(wgpu_data) = &self.wgpu_data {
Expand All @@ -47,13 +52,12 @@ where
wgpu_data.write_particle_data(&storage.affected, &storage.massive, &self.queue);
wgpu_data.compute_pass(&self.device, &self.queue);

Box::new(
wgpu_data
.read_accelerations(&self.device)
.into_iter()
// 1 byte padding between each vec3<f32>.
.map(|acc: [f32; 4]| V::from([acc[0], acc[1], acc[2]])),
)
wgpu_data
.read_accelerations(&self.device)
.iter()
// 1 byte padding between each vec3<f32>.
.map(|acc: &[f32; 4]| V::from([acc[0], acc[1], acc[2]]))
.collect()
}
}

Expand Down Expand Up @@ -94,13 +98,13 @@ impl Default for BruteForce {
}
}

impl<S, const DIM: usize, V> Storage<PointMass<V, S>> for FromMassive<[S; DIM], S>
impl<S, const DIM: usize, V> Storage<PointMass<V, S>> for MassiveAffected<[S; DIM], S>
where
S: Scalar + 'static,
V: Into<[S; DIM]> + 'static,
S: internal::Scalar,
V: Into<[S; DIM]>,
{
#[inline]
fn store(input: impl Iterator<Item = PointMass<V, S>>) -> Self {
fn store<I: Iterator<Item = PointMass<V, S>>>(input: I) -> Self {
Self::from(input.map(PointMass::into))
}
}
Expand Down
63 changes: 33 additions & 30 deletions src/algorithms/parallel.rs
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
use crate::{
algorithms::{
tree::{BarnesHutTree, BoundingBox, BoundingBoxDivide, Orthant, Tree},
FromMassive, FromMassiveSIMD, InternalVector, IntoInternalVector, IntoSIMDElement,
PointMass, SIMDScalar, SIMDVector, Scalar,
internal, simd,
tree::{BarnesHutTree, BoundingBox, SubDivide, Tree},
MassiveAffected, MassiveAffectedSIMD, PointMass,
},
compute_method::ComputeMethod,
};
use rayon::iter::{IntoParallelIterator, ParallelIterator};

/// A brute-force [`ComputeMethod`] using the CPU in parallel with [rayon](https://github.com/rayon-rs/rayon).
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};

/// Brute-force [`ComputeMethod`] using the CPU in parallel with [rayon](https://github.com/rayon-rs/rayon).
#[derive(Default, Clone, Copy)]
pub struct BruteForce;

impl<T, S, V> ComputeMethod<FromMassive<T, S>, V> for BruteForce
impl<T, S, V> ComputeMethod<MassiveAffected<T, S>, V> for BruteForce
where
S: Scalar,
T: InternalVector<Scalar = S>,
V: IntoInternalVector<T::Array, InternalVector = T> + Send,
S: internal::Scalar,
T: internal::Vector<Scalar = S>,
V: internal::IntoVectorArray<T::Array, Vector = T> + Send,
{
type Output = Vec<V>;

#[inline]
fn compute(self, storage: FromMassive<T, S>) -> Self::Output {
fn compute(self, storage: MassiveAffected<T, S>) -> Self::Output {
storage
.affected
.into_par_iter()
.map(move |p1| {
.par_iter()
.map(|p1| {
storage
.massive
.iter()
Expand All @@ -46,23 +47,24 @@ where
}
}

/// A brute-force [`ComputeMethod`] using the CPU in parallel with [rayon](https://github.com/rayon-rs/rayon) and explicit SIMD instructions using [ultraviolet](https://github.com/fu5ha/ultraviolet).
/// Brute-force [`ComputeMethod`] using the CPU in parallel with [rayon](https://github.com/rayon-rs/rayon) and explicit SIMD instructions using [ultraviolet](https://github.com/fu5ha/ultraviolet).
#[derive(Default, Clone, Copy)]
pub struct BruteForceSIMD;

impl<const LANES: usize, T, S, V> ComputeMethod<FromMassiveSIMD<LANES, T, S>, V> for BruteForceSIMD
impl<const LANES: usize, T, S, V> ComputeMethod<MassiveAffectedSIMD<LANES, T, S>, V>
for BruteForceSIMD
where
S: SIMDScalar<LANES>,
T: SIMDVector<LANES, SIMDScalar = S>,
V: IntoSIMDElement<T::Element, SIMDVector = T> + Send,
S: simd::Scalar<LANES>,
T: simd::Vector<LANES, Scalar = S>,
V: simd::IntoVectorElement<T::Element, Vector = T> + Send,
{
type Output = Vec<V>;

#[inline]
fn compute(self, storage: FromMassiveSIMD<LANES, T, S>) -> Self::Output {
fn compute(self, storage: MassiveAffectedSIMD<LANES, T, S>) -> Self::Output {
storage
.affected
.into_par_iter()
.par_iter()
.map(|p1| {
let p1 = PointMass::new(T::splat(p1.position), S::splat(p1.mass));
storage.massive.iter().fold(T::default(), |acc, p2| {
Expand All @@ -74,7 +76,7 @@ where
})
})
.map(V::from_after_reduce)
.collect::<Vec<_>>()
.collect()
}
}

Expand All @@ -85,25 +87,26 @@ pub struct BarnesHut<S> {
pub theta: S,
}

impl<T, S, const DIM: usize, const N: usize, V> ComputeMethod<FromMassive<T, S>, V> for BarnesHut<S>
impl<T, S, const DIM: usize, const N: usize, V> ComputeMethod<MassiveAffected<T, S>, V>
for BarnesHut<S>
where
S: Scalar,
T: InternalVector<Scalar = S, Array = [S; DIM]>,
V: IntoInternalVector<T::Array, InternalVector = T> + Send,
BoundingBox<T::Array>: BoundingBoxDivide<PointMass<T, S>, Output = (Orthant<N>, S)>,
S: internal::Scalar,
T: internal::Vector<Scalar = S, Array = [S; DIM]>,
V: internal::IntoVectorArray<T::Array, Vector = T> + Send,
BoundingBox<T::Array>: SubDivide<Divison = [BoundingBox<T::Array>; N]>,
{
type Output = Vec<V>;

#[inline]
fn compute(self, storage: FromMassive<T, S>) -> Self::Output {
fn compute(self, storage: MassiveAffected<T, S>) -> Self::Output {
let mut tree = Tree::new();
let bbox = BoundingBox::containing(storage.massive.iter().map(|p| p.position.into()));
let root = tree.build_node(storage.massive, bbox);
let bbox = BoundingBox::square_with(storage.massive.iter().map(|p| p.position.into()));
let root = tree.build_node(&storage.massive, bbox);

storage
.affected
.into_par_iter()
.map(move |p| V::from_internal(tree.acceleration_at(root, p.position, self.theta)))
.par_iter()
.map(|p| V::from_internal(tree.acceleration_at(root, p.position, self.theta)))
.collect()
}
}
Expand Down
Loading

0 comments on commit d04a748

Please sign in to comment.