diff --git a/DESCRIPTION b/DESCRIPTION
index d6a6eec..1f94a92 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,12 +1,13 @@
Package: diffeqr
Type: Package
Title: Solving Differential Equations (ODEs, SDEs, DDEs, DAEs)
-Version: 1.1.3
+Version: 2.0.0
Authors@R: person("Christopher", "Rackauckas", email = "me@chrisrackauckas.com", role = c("aut", "cre", "cph"))
Description: An interface to 'DifferentialEquations.jl' from the R programming language.
It has unique high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE),
delay differential equations (DDE), differential-algebraic equations (DAE), and more. Much of the functionality,
including features like adaptive time stepping in SDEs, are unique and allow for multiple orders of magnitude speedup over more common methods.
+ Supports GPUs, with support for CUDA (NVIDIA), AMD GPUs, Intel oneAPI GPUs, and Apple's Metal (M-series chip GPUs).
'diffeqr' attaches an R interface onto the package, allowing seamless use of this tooling by R users. For more information,
see Rackauckas and Nie (2017) .
Depends: R (>= 3.4.0)
diff --git a/NEWS.md b/NEWS.md
index f951301..65cf01f 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,3 +1,8 @@
+## Release v2.0.0
+
+Support new DiffEqGPU syntax. This requires passing a backend. Supports NVIDIA CUDA, Intel OneAPI,
+AMD GPUs, and Apple Metal GPUs. Also much faster GPU compilation and runtime performance.
+
## Release v1.1.2
Bugfixes for newer Julia versions.
diff --git a/vignettes/gpu.Rmd b/vignettes/gpu.Rmd
index bbd5ce1..7438469 100644
--- a/vignettes/gpu.Rmd
+++ b/vignettes/gpu.Rmd
@@ -16,11 +16,9 @@ knitr::opts_chunk$set(
)
```
-## GPU-Accelerated ODE Solving of Ensembles
-
In many cases one is interested in solving the same ODE many times over many
different initial conditions and parameters. In diffeqr parlance this is called
-an ensemble solve. diffeqr inherits the parallelism tools of the
+an ensemble solve. diffeqr inherits the parallelism tools of the
[SciML ecosystem](https://sciml.ai/) that are used for things like
[automated equation discovery and acceleration](https://arxiv.org/abs/2001.04385).
Here we will demonstrate using these parallel tools to accelerate the solving
@@ -43,7 +41,7 @@ prob <- de$ODEProblem(lorenz,u0,tspan,p)
fastprob <- diffeqr::jitoptimize_ode(de,prob)
```
-Now we use the `EnsembleProblem` as defined on the
+Now we use the `EnsembleProblem` as defined on the
[ensemble parallelism page of the documentation](https://diffeq.sciml.ai/stable/features/ensemble/):
Let's build an ensemble by utilizing uniform random numbers to randomize the
initial conditions and parameters:
@@ -62,13 +60,15 @@ sol = de$solve(ensembleprob,de$Tsit5(),de$EnsembleSerial(),trajectories=10000,sa
```
To add GPUs to the mix, we need to bring in [DiffEqGPU](https://github.com/SciML/DiffEqGPU.jl).
-The `diffeqr::diffeqgpu_setup` helper function will install CUDA for you and
+The `diffeqr::diffeqgpu_setup()` helper function will install CUDA for you and
bring all of the bindings into the returned object:
```R
-degpu <- diffeqr::diffeqgpu_setup(backend="CUDA")
+degpu <- diffeqr::diffeqgpu_setup("CUDA")
```
+#### Note: `diffeqr::diffeqgpu_setup` can take awhile to run the first time as it installs the drivers!
+
Now we simply use `EnsembleGPUKernel(degpu$CUDABackend())` with a
GPU-specialized ODE solver `GPUTsit5()` to solve 10,000 ODEs on the GPU in
parallel:
@@ -77,6 +77,20 @@ parallel:
sol <- de$solve(ensembleprob,degpu$GPUTsit5(),degpu$EnsembleGPUKernel(degpu$CUDABackend()),trajectories=10000,saveat=0.01)
```
+For the full list of choices for specialized GPU solvers, see
+[the DiffEqGPU.jl documentation](https://docs.sciml.ai/DiffEqGPU/stable/manual/ensemblegpukernel/).
+
+Note that `EnsembleGPUArray` can be used as well, like:
+
+```R
+sol <- de$solve(ensembleprob,de$Tsit5(),degpu$EnsembleGPUArray(degpu$CUDABackend()),trajectories=10000,saveat=0.01)
+```
+
+though we highly recommend the `EnsembleGPUKernel` methods for more speed. Given
+the way the JIT compilation performed will also ensure that the faster kernel
+generation methods work, `EnsembleGPUKernel` is almost certainly the
+better choice in most applications.
+
### Benchmark
To see how much of an effect the parallelism has, let's test this against R's