-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel pipeline #2779
Parallel pipeline #2779
Conversation
fb5cbf8
to
6fcb89a
Compare
5e083b7
to
cd32788
Compare
6892d9c
to
124fea3
Compare
56b98ea
to
57c73a5
Compare
This reverts commit 57c73a5.
013627e
to
f6a0142
Compare
@@ -193,6 +193,12 @@ executables: | |||
- string-interpolate == 0.3.* | |||
verbatim: | |||
default-language: GHC2021 | |||
ghc-options: | |||
- -threaded | |||
- -rtsopts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is -rtsopts
is required to be able to pass the -A
flag? It's strange I seem to be able to set some RTS flags with the default option.
GHC makes a security warning about using this option so we need to be careful.
"""
In GHC 6.12.3 and earlier, the default was to process all RTS options. However, since RTS options can be used to write logging data to arbitrary files under the security context of the running program, there is a potential security problem. For this reason, GHC 7.0.1 and later default to -rtsopts=some.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment
src/Juvix/Data/NumThreads.hs
Outdated
NumThreads i -> return i | ||
NumThreadsAuto -> do | ||
nc <- liftIO GHC.getNumCapabilities | ||
return (max 1 (min 6 (nc - 2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this based on the benchmark experiments you've done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. I have now refined it a bit based on our previous conversation. Basically, we'll use the minimum of number of the processors divided by two and 8 (magic number that I've found is an ok limit). @paulcadman mentioned that there is a bug in GHC where using more cores can lead to unexpected performance loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not have the same cause but here's the issue I found https://gitlab.haskell.org/ghc/ghc/-/issues/9221
This PR addresses a bug/missing case present since v0.6.2, introduced specifically by - PR #2779, That PR involves detecting imports in Juvix files before type checking, and that's the issue. Detecting/scanning imports is done by running a flat parser (which ignores the Juvix Markdown structure) and when it fails, it runs a Megaparser parse. So, for simplicity, we could just continue using the same Megaparser as before for Juvix Markdown files. --------- Co-authored-by: Jan Mas Rovira <[email protected]>
This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end.
Flags:
There are two new global flags:
-N / --threads
. It is used to set the number of capabilities. According to GHC documentation: Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time. When compiling in parallel, we create this many worker threads. The default value is-N auto
, which sets-N
to half the number of logical cores, capped at 8.--dev-show-thread-ids
. When given, the thread id is printed in the compilation progress log. E.g.Parallel compilation
src/Parallel/ParallelTemplate.hs
which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code.Code changes:
setup
stage where we were registering dependencies. Instead, the dependencies are registered when thepathResolver
is run for the first time. This way it is safer.ImportTree
is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removedReader ImportParents
from the pipeline.-N1
, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the-threaded
flag).PipelineOptions
, which are options that are used to pass options to the effects in the pipeline.PathResolver
constraint has been removed from theupTo*
functions in the pipeline due to being redundant.NFData
instances. They are needed to force the full evaluation ofStored.ModuleInfo
in each of the threads.Cache
effect usesSharedState
as opposed toLocalState
. Perhaps we should provide different versions.Cache
handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel.Tests
tests/positive/Internal/Lambda.juvix
. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (Functions with clauses with differing number of patterns type-check but are not correctly compiled #1706).Future Work
Stored.ModuleInfo
, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation.juvix
the compiler andjuvix
the build tool. When usingjuvix
as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial.ParallelTemplate
uses low level primitives such as forkIO. At some point it should be refactored to use safer functions from theEffectful.Concurrent.Async
module.Benchmarks
On some benchmarks, I include the GHC runtime option
-A
, which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of-N
and-A
through trial and error (but this oviously depends on the cpu and juvix projects).Typecheck the standard library
Clean run (88% faster than main):
Cached run (43% faster than main):
Typecheck the test suite of the containers library
At the moment this is the biggest juvix project that we have.
Clean run (105% faster than main)
Cached run (54% faster than main)