forked from ispc/ispc.github.com
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ispc_for_gen.html
465 lines (445 loc) · 25.7 KB
/
ispc_for_gen.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.15.2: http://docutils.sourceforge.net/" />
<title>Intel® ISPC for GEN</title>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1486404-4']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<link rel="stylesheet" href="css/style.css" type="text/css" />
</head>
<body>
<div class="document" id="intel-ispc-for-gen">
<div id="wrap">
<div id="wrap2">
<div id="header">
<h1 id="logo">Intel® Implicit SPMD Program Compiler</h1>
<div id="slogan">An open-source compiler for high-performance SIMD programming on
the CPU</div>
</div>
<div id="nav">
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="features.html">Features</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li id="selected"><a href="documentation.html">Documentation</a></li>
<li><a href="perf.html">Performance</a></li>
<li><a href="contrib.html">Contributors</a></li>
</ul>
</div>
</div>
<div id="content-wrap">
<div id="sidebar">
<div class="widgetspace">
<h1>Resources</h1>
<ul class="menu">
<li><a href="http://github.com/ispc/ispc/">ispc page on github</a></li>
<li><a href="http://groups.google.com/group/ispc-users/">ispc
users mailing list</a></li>
<li><a href="http://groups.google.com/group/ispc-dev/">ispc
developers mailing list</a></li>
<li><a href="http://github.com/ispc/ispc/wiki/">Wiki</a></li>
<li><a href="http://github.com/ispc/ispc/issues/">Bug tracking</a></li>
</ul>
</div>
</div>
<h1 class="title">Intel® ISPC for GEN</h1>
<div id="content">
<p>The Intel® Implicit SPMD Program Compiler (Intel® ISPC) got initial support for
Intel GPUs recently. The compilation for a GPU is pretty straightforward from
the user's point of view, but managing the execution of code on a GPU may add
complexity. You can use a low-level API <a class="reference external" href="https://spec.oneapi.com/level-zero/latest/index.html">oneAPI Level Zero</a> to manage available GPU
devices, memory transfers between CPU and GPU, code execution, and
synchronization. Another possibility is to use <a class="reference internal" href="#ispc-run-time-ispcrt">ISPC Run Time (ISPCRT)</a>,
which is part of the ISPC package, to manage that complexity and create
a unified abstraction for executing tasks on CPU and GPU.</p>
<p>Contents:</p>
<ul class="simple">
<li><a class="reference internal" href="#using-the-ispc-compiler">Using The ISPC Compiler</a><ul>
<li><a class="reference internal" href="#environment">Environment</a></li>
<li><a class="reference internal" href="#basic-command-line-options">Basic Command-line Options</a></li>
</ul>
</li>
<li><a class="reference internal" href="#ispc-run-time-ispcrt">ISPC Run Time (ISPCRT)</a><ul>
<li><a class="reference internal" href="#ispcrt-objects">ISPCRT Objects</a></li>
<li><a class="reference internal" href="#execution-model">Execution Model</a></li>
</ul>
</li>
<li><a class="reference internal" href="#compiling-and-running-simple-ispc-program">Compiling and Running Simple ISPC Program</a></li>
<li><a class="reference internal" href="#language-limitations-and-known-issues">Language Limitations and Known Issues</a></li>
<li><a class="reference internal" href="#performance">Performance</a></li>
<li><a class="reference internal" href="#faq">FAQ</a><ul>
<li><a class="reference internal" href="#how-to-get-an-assembly-file-from-spir-v">How to Get an Assembly File from SPIR-V?</a></li>
</ul>
</li>
</ul>
<div class="section" id="using-the-ispc-compiler">
<h1>Using The ISPC Compiler</h1>
<p>The output from <tt class="docutils literal">ispc</tt> for GEN targets is SPIR-V file. SPIR-V output format
is selected by default when one of <tt class="docutils literal">genx</tt> targets is used:</p>
<pre class="literal-block">
ispc foo.ispc --target=genx-x8 -o foo.spv
</pre>
<p>The SPIR-V file is consumed by the runtime for further compilation and execution
on GPU.</p>
<div class="section" id="environment">
<h2>Environment</h2>
<p>Currently, <tt class="docutils literal">Intel® ISPC for GEN</tt> is supported on Linux only but it is planned
to be extended for Windows as well in a future release.
Recommended and tested Linux distribution is Ubuntu 18.04.</p>
<p>You need to have a system with <tt class="docutils literal">Intel(R) Processor Graphics Gen9</tt>.</p>
<p>For the execution of ISPC programs on GPU, please install <a class="reference external" href="https://github.com/intel/compute-runtime/releases">Intel(R)
Graphics Compute Runtime</a>
and <a class="reference external" href="https://github.com/oneapi-src/level-zero/releases">Level Zero Loader</a>.</p>
<p>To use ISPC Run Time for CPU you need to have <tt class="docutils literal">OpenMP runtime</tt> installed on
your system. Consult your Linux distribution documentation for installation
of OpenMP runtime instructions.</p>
</div>
<div class="section" id="basic-command-line-options">
<h2>Basic Command-line Options</h2>
<p>Two new targets were introduced for GPU support: <tt class="docutils literal"><span class="pre">genx-x8</span></tt> and <tt class="docutils literal"><span class="pre">genx-x16</span></tt>.</p>
<p>If the <tt class="docutils literal"><span class="pre">-o</span></tt> flag is given, <tt class="docutils literal">ispc</tt> will generate a SPIR-V output file.
Optionally you can use <tt class="docutils literal"><span class="pre">--emit-spirv</span></tt> flag:</p>
<pre class="literal-block">
ispc --target=genx-x8 --emit-spirv foo.ispc -o foo.spv
</pre>
<p>Also two new <tt class="docutils literal">arch</tt> options were introduced: <tt class="docutils literal">genx32</tt> and <tt class="docutils literal">genx64</tt>.
<tt class="docutils literal">genx64</tt> is default and corresponds to 64-bit host and has 64-bit pointer size,
<tt class="docutils literal">genx32</tt> corresponds to 32-bit host and has 32-bit pointer size.</p>
<p>To generate LLVM bitcode, use the <tt class="docutils literal"><span class="pre">--emit-llvm</span></tt> flag.
To generate LLVM bitcode in textual form, use the <tt class="docutils literal"><span class="pre">--emit-llvm-text</span></tt> flag.</p>
<p>Optimizations are on by default; they can be turned off with <tt class="docutils literal"><span class="pre">-O0</span></tt>. However,
for the current release, it is not recommended to use <tt class="docutils literal"><span class="pre">-O0</span></tt>, there are several
known issues.</p>
<p>Generating a text assembly file using <tt class="docutils literal"><span class="pre">--emit-asm</span></tt> is not supported yet.
See <a class="reference internal" href="#how-to-get-an-assembly-file-from-spir-v">How to Get an Assembly File from SPIR-V?</a> section about how to get the
assembly from SPIR-V file.</p>
<p>By default, 64-bit addressing is used. You can change it to 32-bit addressing by
using <tt class="docutils literal"><span class="pre">--addressing=32</span></tt> or <tt class="docutils literal"><span class="pre">--arch=genx32</span></tt> however pointer size should be
the same for host and device code so 32-bit addressing will only work with
32-bit host programs.</p>
</div>
</div>
<div class="section" id="ispc-run-time-ispcrt">
<h1>ISPC Run Time (ISPCRT)</h1>
<p><tt class="docutils literal">ISPC Run Time (ISPCRT)</tt> unifies execution models for CPU and GPU targets. It
is a high-level abstraction on the top of <a class="reference external" href="https://spec.oneapi.com/level-zero/latest/index.html">oneAPI Level Zero</a>. You can continue
using ISPC for CPU without this runtime and alternatively use pure <tt class="docutils literal">oneAPI
Level Zero</tt> for GPU. However, we strongly encourage you to try <tt class="docutils literal">ISPCRT</tt>
and give us feedback!
The <tt class="docutils literal">ISPCRT</tt> provides C and C++ APIs which are documented in the header files
(see <tt class="docutils literal">ispcrt.h</tt> and <tt class="docutils literal">ispcrt.hpp</tt>) and distributed as a library that you can
link to.
Examples in <tt class="docutils literal">ispc/examples/portable/genx/</tt> directory demonstrate how to use
this API to run SPMD programs on CPU or GPU. You can see how to use
<tt class="docutils literal">oneAPI Level Zero</tt> runtime in <tt class="docutils literal">sgemm</tt> example.</p>
<div class="section" id="ispcrt-objects">
<h2>ISPCRT Objects</h2>
<p>The <tt class="docutils literal">ISPC Run Time</tt> uses the following abstractions to manage code execution:</p>
<ul class="simple">
<li><tt class="docutils literal">Device</tt> - represents a CPU or a GPU that can execute SPMD program and has
some operational memory available. The user may select a particular type of
device (CPU or GPU) or allow the runtime to decide which device will be used.</li>
<li><tt class="docutils literal">Memory view</tt> - represents data that need to be accessed by different
<tt class="docutils literal">devices</tt>. For example, input data for code running on GPU must be firstly
prepared by a CPU in its memory, then transferred to a GPU memory to perform
computations on.</li>
<li><tt class="docutils literal">Task queue</tt> - Each <tt class="docutils literal">device</tt> has a task (command) queue and executes
commands from it. The execution may be asynchronous, which means that subsequent
commands can begin executing before the previous ones complete. There are
synchronization primitives available to make the execution synchronous.</li>
<li><tt class="docutils literal">Barrier</tt> - synchronization primitive that can be inserted into
a <tt class="docutils literal">task queue</tt> to make sure that all tasks previously inserted into this
queue have completed execution.</li>
<li><tt class="docutils literal">Module</tt> - represents a set of <tt class="docutils literal">kernels</tt> that are compiled together and
thus can share some common code. In this sense, SPIR-V file produced by <tt class="docutils literal">ispc</tt>
is a <tt class="docutils literal">module</tt> for the <tt class="docutils literal">ISPCRT</tt>.</li>
<li><tt class="docutils literal">Kernel</tt> - is a function that is an entry point to a <tt class="docutils literal">module</tt> and can be
called by inserting kernel execution command into a <tt class="docutils literal">task queue</tt>. A kernel
has one parameter - a pointer to a structure of actual kernel parameters.</li>
<li><tt class="docutils literal">Future</tt> - can be treated as a promise that at some point <tt class="docutils literal">kernel</tt>
execution connected to this object will be completed and the object will become
valid.
<tt class="docutils literal">Futures</tt> are returned when a <tt class="docutils literal">kernel</tt> invocation is inserted into
a <tt class="docutils literal">task queue</tt>. When the <tt class="docutils literal">task queue</tt> is executed on a device, the
<tt class="docutils literal">future</tt> object becomes valid and can be used to retrieve information about
the <tt class="docutils literal">kernel</tt> execution.</li>
</ul>
<p>All <tt class="docutils literal">ISPCRT</tt> objects support reference counting, which means that it is not
necessary to perform detailed memory management. The objects will be released
once they are not used.</p>
</div>
<div class="section" id="execution-model">
<h2>Execution Model</h2>
<p>The idea of <a class="reference external" href="https://ispc.github.io/ispc.html#task-parallelism-launch-and-sync-statements">ISPC tasks</a>
has been extended to support the execution of kernels on a GPU. Each kernel
execution command inserted into a task queue is parametrized with the number
of tasks (threads) that should be launched on a GPU. Each task must decide
on which part of the problem it should work, exactly the same as it happens
in the CPU case. Within tasks, the program executes in SPMD manner (again
the regular ISPC execution model is copied). All built-in variables used for
that purpose (such as <tt class="docutils literal">taskIndex</tt>, <tt class="docutils literal">taskCount</tt>, <tt class="docutils literal">programIndex</tt>,
<tt class="docutils literal">programCount</tt>) are available for use on GPU.</p>
</div>
</div>
<div class="section" id="compiling-and-running-simple-ispc-program">
<h1>Compiling and Running Simple ISPC Program</h1>
<p>The directory <tt class="docutils literal">examples/portable/genx/simple</tt> in the <tt class="docutils literal">ispc</tt> distribution
includes a simple example of how to use <tt class="docutils literal">ispc</tt> with a short C++ program for
CPU and GPU targets with ISPC Run Time. See the file <tt class="docutils literal">simple.ispc</tt> in that
directory (also reproduced here.)</p>
<pre class="literal-block">
struct Parameters {
float *vin;
float *vout;
int count;
};
task void simple_ispc(void *uniform _p) {
Parameters *uniform p = (Parameters * uniform) _p;
foreach (index = 0 ... p->count) {
// Load the appropriate input value for this program instance.
float v = p->vin[index];
// Do an arbitrary little computation, but at least make the
// computation dependent on the value being processed
if (v < 3.)
v = v * v;
else
v = sqrt(v);
// And write the result to the output array.
p->vout[index] = v;
}
}
#include "ispcrt.isph"
DEFINE_CPU_ENTRY_POINT(simple_ispc)
</pre>
<p>There are several differences in comparison with CPU-only version of this
example located in <tt class="docutils literal">examples/simple</tt>. The first thing to notice
in this program is the usage of the <tt class="docutils literal">task</tt> keyword in the function definition
instead of <tt class="docutils literal">export</tt>; this indicates that this function is a <tt class="docutils literal">kernel</tt> so it
can be called from the host.</p>
<p>Second thing to notice is <tt class="docutils literal">DEFINE_CPU_ENTRY_POINT</tt> which tells <tt class="docutils literal">ISPCRT</tt> what
function is an entry point for CPU. If you look into the definition of
<tt class="docutils literal">DEFINE_CPU_ENTRY_POINT</tt>, it is just simple <tt class="docutils literal">launch</tt> call:</p>
<pre class="literal-block">
launch[dim0, dim1, dim2] fcn_name(parameters);
</pre>
<p>It is used to set up thread space for CPU and GPU targets in a seamless way
in host code. If you don't plan to use <tt class="docutils literal">ISPCRT</tt> on CPU, you don't need to use
<tt class="docutils literal">DEFINE_CPU_ENTRY_POINT</tt> in ISPC program. Otherwise, you should have
<tt class="docutils literal">DEFINE_CPU_ENTRY_POINT</tt> for each function you plan to call from <tt class="docutils literal">ISPCRT</tt>.</p>
<p>The final thing to notice is that instead of using real parameters for the
kernel <tt class="docutils literal">void * uniform</tt> is used and later it is cast to <tt class="docutils literal">struct Parameters</tt>.
This approach is used to set up parameters for the kernel in a seamless way
for CPU and GPU on the host side.</p>
<p>Now let's look into <tt class="docutils literal">simple.cpp</tt>. It executes the ISPC kernel on CPU or GPU
depending on an input parameter. The device type is managed by
<tt class="docutils literal">ISPCRTDeviceType</tt> which can be set to <tt class="docutils literal">ISPCRT_DEVICE_TYPE_CPU</tt>,
<tt class="docutils literal">ISPCRT_DEVICE_TYPE_GPU</tt> or <tt class="docutils literal">ISPCRT_DEVICE_TYPE_AUTO</tt> (tries to use GPU, but
fallback to CPU if no GPUs found).</p>
<p>The prograam starts with including <tt class="docutils literal">ISPCRT</tt> header:</p>
<pre class="literal-block">
#include "ispcrt.hpp"
</pre>
<p>After that <tt class="docutils literal">ISPCRT</tt> device is created:</p>
<pre class="literal-block">
ispcrt::Device device(device_type)
</pre>
<p>Then we're setting up parameters for ISPC kernel:</p>
<pre class="literal-block">
// Setup input array
ispcrt::Array<float> vin_dev(device, vin);
// Setup output array
ispcrt::Array<float> vout_dev(device, vout);
// Setup parameters structure
Parameters p;
p.vin = vin_dev.devicePtr();
p.vout = vout_dev.devicePtr();
p.count = SIZE;
auto p_dev = ispcrt::Array<Parameters>(device, p);
</pre>
<p>Notice that all reference types like arrays and structures should be wrapped up
into <tt class="docutils literal"><span class="pre">ispcrt::Array</span></tt> for correct passing to ISPC kernel.</p>
<p>Then we set up module and kernel to execute:</p>
<pre class="literal-block">
ispcrt::Module module(device, "genx_simple");
ispcrt::Kernel kernel(device, module, "simple_ispc");
</pre>
<p>The name of the module must correspond to the name of output from ISPC compilation
without extension. So in this example <tt class="docutils literal">simple.ispc</tt> will be compiled to
<tt class="docutils literal">genx_simple.spv</tt> for GPU and to <tt class="docutils literal">libgenx_simple.so</tt> for CPU so we use
<tt class="docutils literal">genx_simple</tt> as the module name.
The name of the kernel is just the name of the required <tt class="docutils literal">task</tt> function from
the ISPC kernel.</p>
<p>The rest of the program creates <tt class="docutils literal"><span class="pre">ispcrt::TaskQueue</span></tt>, fills it with required
steps and executes it:</p>
<pre class="literal-block">
ispcrt::TaskQueue queue(device);
// ispcrt::Array objects which used as inputs for ISPC kernel should be
// explicitly copied to device from host
queue.copyToDevice(p_dev);
queue.copyToDevice(vin_dev);
// Make sure that input arrays were copied
queue.barrier();
// Launch the kernel on the device using 1 thread
queue.launch(kernel, p_dev, 1);
// Make sure that execution completed
queue.barrier();
// ispcrt::Array objects which used as outputs of ISPC kernel should be
// explicitly copied to host from device
queue.copyToHost(vout_dev);
// Make sure that input arrays were copied
queue.barrier();
// Execute queue and sync
queue.sync();
</pre>
<p>To build and run examples go to <tt class="docutils literal">examples/portable/genx</tt> and create
<tt class="docutils literal">build</tt> folder. Run <tt class="docutils literal">cmake <span class="pre">-DISPC_EXECUTABLE=<path_to_ispc_binary></span>
<span class="pre">-Dispcrt_DIR=<path_to_ispcrt_cmake></span> ../</tt> from <tt class="docutils literal">build</tt> folder. Or add path
to <tt class="docutils literal">ispc</tt> to your PATH and just run <tt class="docutils literal">cmake ../</tt>. Build examples using <tt class="docutils literal">make</tt>.
Go to <tt class="docutils literal">simple</tt> folder and see what files were generated:</p>
<ul class="simple">
<li><tt class="docutils literal">genx_simple.spv</tt> contains SPIR-V representation. This file is passed
by <tt class="docutils literal">ISPCRT</tt> to <tt class="docutils literal">Intel(R) Graphics Compute Runtime</tt> for execution on GPU.</li>
<li><tt class="docutils literal">libgenx_simple.so</tt> incorporates object files produced from ISPC kernel
for different targets (you can find them in <tt class="docutils literal">local_ispc</tt> subfolder).
This library is loaded from host application <tt class="docutils literal">host_simple</tt> and is used for
execution on CPU.</li>
<li><tt class="docutils literal"><span class="pre">simple_ispc_<target>.h</span></tt> files include the declaration for the C-callable
functions. They are not really used and produced just for the reference.</li>
<li><tt class="docutils literal">host_simple</tt> is the main executable. When it runs, it generates
the expected output:</li>
</ul>
<pre class="literal-block">
Executed on: Auto
0: simple(0.000000) = 0.000000
1: simple(1.000000) = 1.000000
2: simple(2.000000) = 4.000000
3: simple(3.000000) = 1.732051
4: simple(4.000000) = 2.000000
...
</pre>
<p>To set up all compilation/link commands in your application we strongly
recommend using <tt class="docutils literal">add_ispc_kernel</tt> CMake function from CMake module included
into ISPC distribution package.</p>
<p>So the complete <tt class="docutils literal">CMakeFile.txt</tt> to build <tt class="docutils literal">simple</tt> example extracted from ISPC
build system is the following:</p>
<pre class="literal-block">
cmake_minimum_required(VERSION 3.14)
find_package(ispcrt REQUIRED)
add_executable(host_simple simple.cpp)
add_ispc_kernel(genx_simple simple.ispc "")
target_link_libraries(host_simple PRIVATE ispcrt::ispcrt)
</pre>
<p>And you can configure and build it using:</p>
<pre class="literal-block">
cmake ../ -DISPC_EXECUTABLE_GPU=/home/install/bin/ispc && make
</pre>
<p>You can also run separate compilation commands to achive the same result:</p>
<ul>
<li><p class="first">Compile ISPC kernel for GPU:</p>
<pre class="literal-block">
ispc -I /home/install/include/ispcrt -DISPC_GPU --target=genx-x8 --woff
-o /home/ispc/examples/portable/genx/simple/genx_simple.spv
/home/ispc/examples/portable/genx/simple/simple.ispc
</pre>
</li>
<li><p class="first">Compile ISPC kernel for CPU:</p>
<pre class="literal-block">
ispc -I /home/install/include/ispcrt --arch=x86-64
--target=sse4-i32x4,avx1-i32x8,avx2-i32x8,avx512knl-i32x16,avx512skx-i32x16
--woff --pic --opt=disable-assertions
-h /home/ispc/examples/portable/genx/simple/simple_ispc.h
-o /home/ispc/examples/portable/genx/simple/simple.dev.o
/home/ispc/examples/portable/genx/simple/simple.ispc
</pre>
</li>
<li><p class="first">Produce a library from object files:</p>
<pre class="literal-block">
/usr/bin/c++ -fPIC -shared -Wl,-soname,libgenx_simple.so -o libgenx_simple.so
simple.dev*.o
</pre>
</li>
<li><p class="first">Compile and link host code:</p>
<pre class="literal-block">
/usr/bin/c++ -DISPCRT -isystem /home/install/include/ispcrt -fPIE
-o /home/ispc/examples/portable/genx/simple/host_simple
/home/ispc/examples/portable/genx/simple/simple.cpp -L/usr/local/lib
-Wl,-rpath,/home/install/lib /home/install/lib/libispcrt.so.1.13.0
</pre>
</li>
</ul>
</div>
<div class="section" id="language-limitations-and-known-issues">
<h1>Language Limitations and Known Issues</h1>
<p>The current release of <tt class="docutils literal">Intel® ISPC for GEN</tt> is in alpha state so not all
functionality is implemented yet. However, it is actively developed so we
expect to cover missing features in the nearest future releases.
Below is the list of known limitations:</p>
<ul class="simple">
<li>No function pointers support</li>
<li>No <tt class="docutils literal">prefetch</tt> support</li>
<li>No global atomics</li>
<li>No double precision math functions</li>
<li>Foreach inside varying control flow is not supported by default. However you
can enable it by using experimental flag <tt class="docutils literal"><span class="pre">--opt=enable-genx-foreach-varying</span></tt>
but performance will degrade with this flag.</li>
</ul>
<p>There are several features which we do not plan to implement for GPU:</p>
<ul class="simple">
<li><tt class="docutils literal">launch</tt> and <tt class="docutils literal">sync</tt> keywords are not supported for GPU in ISPC program
since kernel execution is managed in the host code now.</li>
<li><tt class="docutils literal">New</tt> and <tt class="docutils literal">delete</tt> keywords are not expected to be supported in ISPC
program for GEN target. We expect all memory to be set up on the host side.</li>
<li><tt class="docutils literal">export</tt> functions must return <tt class="docutils literal">void</tt> for GEN targets.</li>
</ul>
</div>
<div class="section" id="performance">
<h1>Performance</h1>
<p>The performance of <tt class="docutils literal">Intel® ISPC for GEN</tt> is not the goal of the current release.
It has a room for improvements and we're working hard to make it better for
the next release. Here are our results for <tt class="docutils literal">mandelbrot</tt> which were obtained on
Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with Intel(R) Gen9 HD Graphics
(max compute units 24):</p>
<ul class="simple">
<li>@time of CPU run: [16.343] milliseconds</li>
<li>@time of GPU run: [29.583] milliseconds</li>
<li>@time of serial run: [566] milliseconds</li>
</ul>
</div>
<div class="section" id="faq">
<h1>FAQ</h1>
<div class="section" id="how-to-get-an-assembly-file-from-spir-v">
<h2>How to Get an Assembly File from SPIR-V?</h2>
<p>Use <tt class="docutils literal">ocloc</tt> tool installed as part of intel-ocloc package:</p>
<pre class="literal-block">
// Create binary first
ocloc compile -file file.spv -spirv_input -options "-vc-codegen" -device <name>
</pre>
<pre class="literal-block">
// Then disassemble it
ocloc disasm -file file_Gen9core.bin -device <name> -dump <FOLDER_TO_DUMP>
</pre>
<p>You will get <tt class="docutils literal">.asm</tt> files for each kernel in <FOLDER_TO_DUMP>.</p>
</div>
</div>
</div>
<div class="clearfix"></div>
<div id="footer"> © <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<!-- Please Do Not remove this link, thank u -->
</div>
</div>
</div>
</div>
</div>
</body>
</html>