-
Notifications
You must be signed in to change notification settings - Fork 29
Numba/Numpy version #4
Comments
I'll leave this open for others to take note of, if there's a need for quick optimization. Don't like introducing a 3rd party dependency (besides the test utils) what with all the leftpad nonsense and friends going around the developer communities, though, so I'll leave it at that. |
Sounds good, I didn't expect it to enter the main branch. Thanks. |
Yup, I understand. That's the correct choice to keep the code free of
non-standard libraries. My contribution is just for people who need the
speed-up and can tolerate using Numba. Thanks for keeping the issue open
for that reference.
…On Wed, Feb 14, 2018 at 12:10 PM, A. Svensson ***@***.***> wrote:
Please note I do not intend to merge it in to main :p sorry. I meant
leaving this issue open for future references.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFEfYR0_sHp0opNjXo0iNzC4rLpHybbTks5tUxNqgaJpZM4MOs6h>
.
|
@ktritz dude, your implementation has eye-watering speeds. Truly impressive. You should make a separate package or something, so it can be easily installed with pip. Kudos. |
Thanks for this library, both @lmas and @ktritz! @ktritz would you be able to put a quick bit in the readme on your fork that shows how to engage the speed-ups for those who are willing to use numba? The api doesn't seem to have changed, so it's hard to tell. I don't notice any speedups going from one fork to another (but I admit I haven't timed it). Should I see a clear speed up even for trivial operations like making an image grid, or do I need to engage numba in a meaningful way when I send in x, y? Thanks |
I was also going to ask why this wasn't numpy-based, but I see it's been proposed multiple times. Can you add something about this in the documentation and link to the forks? |
Hmm been thinking that I might maintain a numpy based branch too, so it could be released on pypi |
So the numpy version would take an 2D array of coordinates as input? |
Haven't worked with numpy before. Could you describe your use case more? |
Instead of
you would use something like:
and it would calculate all the values simultaneously, in much less time. |
Ah yeah kinda obvious usage now that you explained it, thanks. And it's kinda obvious must have feature too.. Alright so I'm heavily considering adding numpy//numba optimizations to the main branch (forget multiple pkgs). I would like more input from more users if possible, please. |
In my experience, numpy is a much more common dependency than numba. numba is more likely to have problems with installation on different platforms, etc. and shouldn't be necessary if you can vectorize the code using numpy. numba is for |
It's almost a guarantee that anyone who is using Python for noise
calculations is at least using numpy and/or scipy. Numba is far less
common, though it is included by default now in the Anaconda distribution.
That said, if you are hitting bottlenecks in computation, Numba can often
offer a sizable speedup over numpy. even if your code is vectorized.
…On Fri, Apr 2, 2021 at 10:20 AM endolith ***@***.***> wrote:
In my experience, numpy is a much more common dependency than numba. numba
is more likely to have problems with installation on different platforms,
etc. and shouldn't be necessary if you can vectorize the code using numpy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIR6YMQWZHLK3MK32C7S7DTGXHBTANCNFSM4DB2Z2QQ>
.
|
Thank you. If the gains are big enough I might include numba too then. Maybe. |
You can always do a try, except on the numba import and fall back on numpy.
…On Sat, Apr 3, 2021, 3:26 AM Alex ***@***.***> wrote:
Thank you. If the gains are big enough I might include numba too then.
Maybe.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIR6YJ7P6DK2HZ6IE76G7TTG27LHANCNFSM4DB2Z2QQ>
.
|
If you are implementing an optimized numpy/numba version, it might be useful to declare them as optional dependencies in In the package itself, you can detect if numpy is present with find_spec from importlib as mentioned here and then switch which implementation is active. |
Hmm I don't know, not a fan of complicating the installation or the code for now. Feels like numpy is common enough that I could add it as a hard, simple dependency instead. I'll reconsider in the future if people has problems or suddenly stronger opinions. Pinning versions/vendoring is of course always a good thing 👍 |
Thanks for the package, @lmas! I just finished optimizing my own fork using numba before reading the issues and arrived at something very similar to the verions of @ktritz. My impression is also that almost anyone who needs a 2d noise function should have numpy installed already. To give an example of @ktritz 's suggestion with the try import/except: try:
from numba import jit
except ImportError:
def jit(*args, **kwargs):
def wrapper(func):
return func
return wrapper This way, anyone with numba installed could benefit from a speedup, which could be stated somewhere in the documentation, and for everyone else, the jit decorator will do nothing. |
Thank you and thanks for the suggestion. Which functions benefit from the numba decorator? And did you run the |
Function benefiting from / viable for just in time compilationThe functions which are decorated are the extrapolate functions, the noise functions themselves and some helper functions (noise2da) to iterate over the input arrays (these are flattend by calling @njit(cache=True)
def extrapolate2d(xsb, ysb, dx, dy, perm):
index = perm[(perm[xsb & 0xFF] + ysb) & 0xFF] & 0x0E
g1 = GRADIENTS_2D[index]
g2 = GRADIENTS_2D[index+1]
return g1 * dx + g2 * dy
@njit(cache=True)
def noise2da(a, x, y, perm):
for n in range(len(x)):
a[n] = noise2d(x[n], y[n], perm)
@njit(cache=True)
def noise2d(x, y, perm):
"""
Generate 2D OpenSimplex noise from X,Y coordinates.
"""
# Place input coordinates onto grid.
stretch_offset = (x + y) * STRETCH_CONSTANT_2D
xs = x + stretch_offset
ys = y + stretch_offset
...
These functions where moved outside of the class and the perm array is passed around. Essentially all the functions that do calculations with basic data types like any version of int and float are just in time compilable. Numba has to figure out the type of the parameters and variables in order to compile them. Up until now, I was not able to get any functions with dicts or classes running. My benchmark resultsI extended the benchmark to 1 000 000 function calls. For the current version: for i in range(1000000):
oldsimplex.noise2d(0.1, 0.1)
oldsimplex.noise3d(0.1, 0.1, 0.1)
oldsimplex.noise4d(0.1, 0.1, 0.1, 0.1)
For my modified version: Here, I used a linarly spaced 1d array between 0 and 1 for all coordinates. x = np.linspace(0, 1, number)
simplex.noise2d(x, x)
simplex.noise3d(x, x, x)
simplex.noise4d(x, x, x, x) The numpy array is iterated over inside a function optimized by numba. @njit(cache=True)
def noise2da(a, x, y, perm):
for n in range(len(x)):
a[n] = noise2d(x[n], y[n], perm)
This comparison running on an Intel Core i7-5820K yield a speedup of about 74. |
Hmm that's quite impressive but |
Yes, I also ran I think the |
Also, I just finished a numpy based port of the I only did the 2d function, because all the if/else clauses make it really messy! There is also a bit performance hit when using numpy array slicing instead of the ifs. For a 1000 x 1000 grid, the runtimes (with %timit in Jupyter) are as follows: numba optimized : 35.4 ms ± 505 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Looks like the algorithm is just not well suited for optimization with numpy array slicing. |
Ah ok thanks, sounds good then 👍 And yeah the overall structure can be a lil' messy as the old upstream source was in java and my port wasn't very pythonic from the beginning I think heh. Also a little suprised there was that much of a diff. between the version, was having the impression that numpy was the better one as it was working with arrays in a leaner way. But I'm getting carried away and should really stop and warn you before you waste too much time on it! I'm busy porting the new opensimplex2 and will probably push it this week (which will deprecate v0.3) |
Thanks for the warning :) Looking forward to the new version. The optimization issue is quite interesting for me to explore as well. The numpy implementation was just to see whether its a viable route to go - it seems that the array perspective is not the ideal one here.
I don't think its your implementation or port. Just a little more speculation from my side about my implementation with numpy arrays:The biggest drawback is, that I have to explore the whole possibility tree with the whole array, meaning I calculate whether or not to a calculation inside a if/else statement for all entries and then only do the calculation for a subset. I didn't test this though, so take it with a grain of salt ;) |
I think you're pretty spot on and matches upstream's thoughts.
|
I just had a quick try on optimizing the new version on the dev branch apart from having to replace the classes for points and gradients with arrays (numba does not support arrays of objects...) it works. When you are finished, I'll have a look at your port to see if I can optimize it. The new version looks much cleaner. It gives much more hints at what's going on :) |
Yeah it was pretty nice and clean to work with. In that linked issue the author talks about refactoring away the array lookups for performance so it looks like it's back to long rows of if/else branches, which hopefully increases performance again as my port is about twice as slow as v0.3 (ouch). Kinda holding of any more work until Simplex2S has been updated, Maybe Soon ™️ |
A bit late to the party but I would very much appreciate further acceleration of performance. :-) numpy is pretty mature/common now. (As an example it's one of the "standard" libraries that ships with Blender. Blender has a python API for developing add-ons and numpy is a boon for operations on dense/heavy meshes.) Although numba might be a bit more of a pain to install from scratch it's just as easy to get from Anaconda as numpy is. As an end user my use case is: k = 500
seed = 1337
verts, tris = mz.icosa_sphere(k)
noisy_verts, elevations = sampleNoise(verts, seed)
# Do stuff with noisy_verts and elevations def sampleNoise(verts, noise_seed=0):
"""Sample a simplex noise for given vertices"""
scale_factor = 0.05
roughness = 12
tmp = OpenSimplex(seed=noise_seed)
elevations = np.zeros((len(verts), 1))
for v, vert in enumerate(verts):
x, y, z = vert[0] * roughness, vert[1] * roughness, vert[2] * roughness
elevations[v] = tmp.noise3d(x,y,z) * scale_factor + 1
new_verts = verts * elevations
return new_verts, elevations For the simplest possible icosahedron
And with a higher subdivision the result is like so after sampling and applying the noise: For 100,002 vertices ( I also tried some other variations of the function like the one below but performance was the same for all of them within a few seconds of each other, but when it's already taking 3 and a half minutes to run then 204 vs 207 seconds isn't much to crow about. def sampleNoise2(verts, noise_seed=0):
"""Sample a simplex noise for given vertices"""
scale_factor = 0.05
roughness = 12
tmp = OpenSimplex(seed=noise_seed)
roughed_verts = verts * roughness
elevations = np.array([[tmp.noise3d(v[0],v[1],v[2]) * scale_factor + 1] for v in roughed_verts])
new_verts = verts * elevations
return new_verts, elevations However, using ktritz's original numba fork as a drop-in replacement (without changing a single line of my own code for additional optimization): Still slower than I'd like, but a drastic improvement (although the fork is based on 2.2 I believe). Now the variations of the sampling function start to matter. Best one so far is almost the same as the first but got rid of the separate x, y, z and saved 6 seconds (19 vs 13 seconds for 8.1 mil vertices): def sampleNoise4(verts, noise_seed=0):
"""Sample a simplex noise for given vertices"""
scale_factor = 0.05
roughness = 12
tmp = OpenSimplex(seed=noise_seed)
elevations = np.zeros((len(verts), 1))
rough_verts = verts * roughness
for v, vert in enumerate(rough_verts):
elevations[v] = tmp.noise3d(vert[0],vert[1],vert[2]) * scale_factor + 1
new_verts = verts * elevations
return new_verts, elevations |
Sorry for the late reply, moving across the country and prepping for university.. Thank you a lot for your detailed input! And a very cool project, love seeing 3D results! May I ask what's the purpose of generating this detailed icosahedron? And yeah I hear you, need getting this optimisation done sooner! There's been no updates to Simplex2S so far, so I'm thinking I'll just update the current version as it's still performing better than the new dev version. Thank you for another use case! |
I'm building a procedural planet generator. Planets--as we know--are large, so I need a lot of points if I want to represent a reasonable amount of detail from a certain distance. I chose an icosahedron as the distribution of points is more uniform than a UV sphere or a sphereized cube. Since I wrote my previous post I've made some further modifications. I got rid of the OpenSimplex class entirely from ktritz's fork (jit decorating a class is a bit more involved than I can handle at the moment) and I'm calling the needed functions directly (line 121 and 297 in his fork). It's slightly less convenient but it offered another performance boost on top of the one I mentioned previously because now I can jit decorate my noise sampling function where previously I couldn't because numba didn't know how to handle the OpenSimplex object. import numpy as np
from numba import njit, prange
import opensimplex as osi
import meshzoo as mz
def main():
k = 500
world_seed = 1337
verts, tris = mz.icosa_sphere(k)
# Initialize the permutation arrays to be used in noise generation
perm, pgi = osi.init(world_seed)
elevations = sample_noise(verts, perm, pgi, 1.6, 0.4)
# Do stuff with elevations @njit(cache=True, parallel=False)
def sample_noise(verts, perm, pgi, n_roughness=1, n_strength=0.2):
"""Sample a simplex noise for given vertices"""
elevations = np.ones(len(verts))
rough_verts = verts * n_roughness
# Block 1
# Comment me out and try Block 2 instead
for v, vert in enumerate(rough_verts):
elevations[v] = osi.noise3d(vert[0], vert[1], vert[2], perm, pgi)
# Block 2
# Uncomment me and set parallel=True in the njit decorator
# for v in prange(len(rough_verts)):
# elevations[v] = osi.noise3d(rough_verts[v][0], rough_verts[v][1], rough_verts[v][2], perm, pgi)
return np.reshape((elevations + 1) * 0.5 * n_strength, (len(verts), 1)) To show the further improvements to performance you can see comments for two code blocks in 8.1 million vertices went from ~204 seconds to ~13.8 seconds ----> It's ~0.44 seconds now Block 1 is single-threaded performance. If we use Block 2 instead of 1, and set 8.1 million vertices went from ~204 seconds to ~13.8 seconds to ~0.44 seconds ----> It's ~0.08 seconds now 26 minutes to 0.66 seconds. Bananas! On my Ryzen 3900x my CPU can barely touch 100% utilization before it's over (you can clearly see the spike in all cores when |
Thank you a lot, I appreciate your detailed replies and stats! Using Now then, I've finally gotten around to investigate most other linked forks from people in this ticket and how they've New changes has been pushed to Notice: API names have been changed |
I've implemented a fairly direct port/optimization using Numba for python. This runs ~10x faster for single point noise calculations, and 50-100x faster for calculating noise arrays. Obviously, this requires installation of the Numba library, but people are free to check it out if they're interested.
The text was updated successfully, but these errors were encountered: